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FOREWORD 


This  report  is  composed  of  reprints  of  papers  published  over  the  course  of  this  program. 
Each  paper  deals  with  some  aspect  of  the  design  and  capacity  of  optical  pattern  recogni¬ 
tion  systems.  Paper#  1  describes  architectures  and  implementations  for  optical  correlators 
based  on  acoustooptic  devices.  Paper  #2  analyses  the  capacity  of  optical  correlators  for 
image  recognition.  The  third  paper  describes  optical  correlators  which  use  optical  disks, 
rather  than  AO  cells,  as  spatial  light  modulators.  Papers  4  and  5  are  concerned  with 
optical  image  recognition  systems  based  on  neural  network  models.  The  optical  associa¬ 
tive  memory  described  in  these  papers  uses  a  liquid  crystal  spatial  light  modulator,  thin 
holographic  plates  and  a  CCD  to  process  two  dimensional  images.  Papers  6-9  deal  with 
the  use  of  photorefractive  crystals  to  increase  the  adaptability  and  capacity  of  optical  im¬ 
age  processing  systems.  Paper  6  describes  a  system  which  uses  AO  cells  and  a  bismuth 
silicon  oxide  crystal  to  implement  a  time  integrating  correlator.  Paper  7  explores  the  use 
of  volume  holographic  materials,  such  as  photorefractive  crystals,  as  media  for  storing 
spatial  information  at  high  densities.  Various  constraints  are  derived  for  the  storage  and 
reconstruction  of  holographic  information.  Papers  8  and  9  describe  methods  for  using  the 
dynamic  nature  and  high  storage  capacity  of  photorefractive  crystals  to  construct  artificial 
neural  networks.  Paper  #10  presents  an  analysis  of  optical  image  recognition  systems 
based  on  binary  Alters.  Paper  #11  describes  an  extension  of  correlation  optical  associative 
memories  to  higher  orders  which  results  in  higher  storage  capacity. 
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Introduotlon 

Two  dlnanalonal  oorrolatlon  for  porfornlng  laago  raoognition  la  ona  of  tba  oarXlaat 
azanplaa  of  optical  Infornatlon  proeoaalng  auggoatad  by  Tandar  Lugt  [1]  and  it  atllX  ranalna 
ona  of  tba  noat  pronlalng  appXloatlon  aroaa  for  tbla  taobnoXogy.  Tba  raasona  optlcaX  laaga 
eorraXatora  bava  not  found  to  tbla  day  wldaapraad  uaaga  In  pattern  raoognition  ayatana  faXX 
into  two  oatagoriaa:  algorltbaio  Xlnitatlona  and  Xaok  of  davieaa.  CorraXation-baaad  pattern 
raoognition  aXgorithna  bava  waXX  known  Xlnitatlona  auob  aa  aoaXa  and  rotatlonaX  aanaltlvity. 
Over  tba  yaara  aXgorltbna  baaed  on  optloal  oorraXation  bava  bean  davaXopad  [2-5],  whicb  abow 
proalaa  for  ovaroonlng  aona  of  tba  aXgorlttaalo  Xlnitatlona,  bowevar  tbay  bava  not  bean  put 
to  a  reaX  praetloaX  taat  baoauaa  of  tba  laok  of  two  diaanalonaX  apatiaX  Xight  noduXatora 
that  are  naeaaaary  for  tba  lapXanantatloo  of  tba  oXaaalo  Tandar  Lugt  oorraXator. 
Spaolf ioaX Xy,  two  aaparata  two  diaanalonaX  SLMa  are  naadad,  ona  for  recording  tba  input 
laaga  and  aaotbar  for  tba  rafaranoa. 

kaoantXy,  tbara  baa  bean  oonaldarabla  prograaa  la  tba  davaXopnant  of  apatiaX  Xigbt 
noduXatora  whiob  baa  Xad  to  aavaraX  ■real  tlaa*  Inplaaantatlona  of  optloaX  oorraXatora  ualng 
two  diaanalonaX  davloaa  C6-8].  Zn  tbla  paper  wa  report  on  tba  uaa  of  acouatooptlo 
davloaa(lOOa)  wbiob  are  by  far  tba  noat  highly  davaXopad  SLMa  avallabXa.  AODa  however  are 
one  diaanalonaX  SLMa  and  la  order  to  prooaaa  two  dlaanaional  algnaXa  in  an  aoouatooptic 
Inaga  ooraXator,  wa  uaa  a  ooablaatlon  of  taaporaX  and  apatiaX  Intagratlona  [9-11].  In  thia 
paper  wa  brlafXy  review  the  prinolpXaa  of  operation  of  aoouatooptic  inaga  oorraXatora  and 
than  praaant  the  raauXta  of  aavaral  axporlnonta  that  ware  parfornad  with  raaX  tine  ayatana 
that  wa  bava  aaaanbXad. 

Zn  tba  arobltaoturaa  wa  wlXX  dlaouaa,  an  lOD  la  uaad  aa  tba  tranaduear  for  tba  input 
laaga  that  la  to  be  raoognlaad.  Inotbar  aapaot  wa  wlXX  explore  la  waya  of  aaking  tba 
rafaranoa  laaga  prograaaabXa  ratbar  than  being  fixed  on  a  boXograa  recorded  on  a 
pbotograpbio  flXa.  tfa  will  aaa  that  tlaa  doaaln  prooaaalng,  an  inbarant  part  of  tba 
acouatooptlo  inaga  oorralatora,  offara  a  vary  oonvanlant  way  of  introducing  a  progrannabXa 
rafaranoa  and  craataa  a  aat  of  pronlalng  algorltbaio  poaaibiXitlaa  that  arc  not  raaXlaabXa 
with  a  fixed  rafaranoa. 


ggaifil  nathod 


Single  tranaduear  acouatooptlo  davloaa  are  ona  diaanalonaX  apatiaX  Xigbt  noduXatora  and 
banoa  tbay  can  not  be  uaad  to  rapraaant  aa  input  inaga  in  Ita  entirety  at  one  tine. 
Typically  tba  apace  bandwidth  product  of  tba  AOD  ia  about  equal  to  the  nuaber  of  raaoXvabXa 
apota  along  only  one  line  of  tba  inaga  wa  want  to  prooaaa.  Aa  a  roauXt,  tba  atratagy  for 
building  an  aoouatooptic  inaga  oorraXator  la  prooaaa  tba  inaga  ona  line  at  a  tine  ualng  a 
apaea  integrating  optical  prooaaaor  and  aoounuXate  by  tanporaX  integration  on  a  two 
diaanalonaX  detector  array  the  raauXta  of  the  apaoa  integrating  part  of  the  ayataa  to  fora 
tba  full  two  diaanalonaX  oorraXation.  Tba  baalo  idea  will  be  explained  with  tba  aid  of 
Figure-  1.  Tba  operation  wo  wiab  to  parfora  ia  a  two  diaanalonaX  correlation: 


«(*’,y')a 


f(x,y)b*(x*x’,y*y») 


dxdy. 


(1) 


Tba  input  inaga  f(x,y}  la  iaagad  onto  a  TT  eaaara  that  oXaotronioaX  Xy  acana  tba  laaga  and 
produoaa  a  video  algnaX.  Tba  video  aignaX  ia  baterodynad  to  the  appropriate  center 
frequency  and  than  applied  to  tba  plazoaXaotrlo  tranaduear  of  tba  AOD.  Eaob  horizontal  video 
line  propagataa  aaparataly  in  tba  AOD  and  noduXataa  tba  incident  light.  Since  in  our 
prooaaaora  tba  laagaa  are  proooaaad  aa  individual  Xinaa,  fron  now  on  wa  will  denote  the 
input  inaga  aa  f(x,ad),  wbara  a  ia  an  integer  and  d  ia  tba  line  apacing  in  tba  vertical 
dlraotlon.  Tba  optioaX  ayatan  la  a  nuXtlobanoaX  one  diaanalonaX  correlator  that  producea  the 
oorraXation  between  aaob  input  video  Xlna  in  tba  AOD  and  all  the  llnea  of  the  reference 
inaga  which  la  atorad  in  tba  optioaX  ayataa.  Tba  light  incident  on  the  two  dlaenalonal  CCC 
dataotor  at  the  output  ia  aoduXatad  by: 


I 


f(x,nd}b*(x-ez',y') 


dx 


(2) 
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Flgurtt  1.  Two  dlaonaioaol  aooustooptio  laago  oorrolator. 


Hboro  x'  and  y*  aro  tba  apatlal  ooordlnatas  at  tba  output  piano.  In  ordar  to  ooaplata  tba 
two  dlaoaalonal  oporatloa,  wo  naad  to  ahlft  tba  two  dlaonaloaal  pattarn  g.(x',y'}  by  a 
dlatanoo  ad  la  tba  y*  dlraotlon  and  than  aua  ovar  a  (i.a.  aoouaulata  tba  algnal  Troa  all  tba 
dlffarant  Input  llnaa).  Tba  raqulrad  ahlft  and  add  oparation  oan  ba  aoooapllahad  rary 
ooavaalantly  by  continually  aorolllng  tba  pbotoganaratad  obarga  on  tba  CCD  during  aaob  vldao 
fraaa.  Aftar  tba  ntb  Input  borlontal  llna  la  aoanaad  by  tba  TV  oaaara  and  tba  oorralatlon 
batwaan  tbla  ntb  Input  llna  and  tba  rafaranoa  la  addad  onto  tba  pravloua  oontanta  of  tba 
CCD,  tba  CCO  la  than  trlggarad  to  alaotronleal 1 y  tranafar  tba  antlra  obarga  pattern 
vartloally  by  ona  plxal.  Tbla  prooadura  raaulta  In  tba  foraatlon  of  a  obarga  pattarn  on  tba 
CCO  that  can  ba  axpraaaad  aa  followas 

f(z,ad)b*(x^z',Bdfy')  dz  (3) 

Tba  abova  la  racognlsad  aa  a  aaaplad  fora  of  tba  two  dlaanalonal  oorralatlon  in  Eq.  (1).  Tba 
two  dlaanalonal  oorralatlon  la  produced  oontlnuoualy  at  tba  fraaa  rata  of  tba  Input  TV 
oaaara  and  It  appaara  at  tba  CCD  output  la  tba  fora  of  a  ataadard  video  algnal  that  oan  ba 
directly  dlaplayad. 

Vlthln  tbla  baalo  fraaawork,  tbara  are  aavaral  poaalblo  arobltaotural  varlatlona 
prlnolpally  through  obooalng  different  laplaaantatlona  for  tba  aultlobannal  ona  dlaanalonal 
oorralator.  la  rafaranoaa  13  1A  wa  have  daaorlbad  la  detail  two  apaolfle  arobltaoturaa. 
la  wbat  followa  wa  praaant  raaulta  froa  tba  axparlaantal  doaonatratioa  of  tbaaa  ayateaa. 


g<*',y') 


g.(z',y'aad)  ■ 


?/ 


ttfliflErbOhia  tflBUgtaBBtlO  iJUU  oorralator 

Tba  bolograpblo  laaga  oorralator  la  abown  In  Fig. 2.  Tba  datalla  of  tba  operating 
prlaolplaa  have  bean  reported  In  rafaranoa  [10].  Bara  wa  daaorlba  tba  axparlaantal 
apparatua.  Tba  Input  laaga  waa  aanaad  with  a  blgb  raaolutlon  TV  oaaara.  Tba  video  algnal 
froa  tba  TV  oaaara  waa  batarodynad  to  tba  oantar  fraquanoy  of  tba  AOO  (50  MHz),  aapllflad 
and  applied  to  tba  AOO.  Tba  aoouatooptlo  davloa  In  tbla  axparlnant  waa  a  TaO^  (Cryatal 
Taobnology  #  *0503}  davloa  with  35  MHz  bandwidth  and  TO  nloroaaoond  delay.  TbTa  waa  nora 
than  adequate  to  aooonodata  one  atandard  video  line  (63  nloroaaoonda  and  5  MHz).  Aftar 
approzlnataly  52.7  nloroaaoonda  froa  tba  atart  of  tba  borlzontal  olook  tba  algnal  In  tba  AOO 
la  aa  aoouatlo  raplloa  of  tba  video  llna  fron  tba  Input  laaga.  At  that  Inatant  tba  laaar 
dloda  la  trlggarad  to  produoa  a  abort  pulaa  to  fraaza  and  raad-out  tba  algnal  In  tba  AOO. 
Tba  laaar  dloda  uaad  la  tba  azparlaaata  waa  RCA  CS6030I  wltb  peak  powar  equal  to  *0 
allllwatta  and  pulaawldtb  equal  to  50  aaaoaaoonda.  Tba  pulaawldtb  auat  ba  oboaan  equal  to  or 
abortar  than  tba  Invaraa  of  tba-  bandwidth  of  tba  video  algnal  ao  that  tba  notion  of  the 
algnal  In  the  light  dlffraotad  by  tba  AOO  oan  be  noglaotad.  Tba  video  bandwidth  la  tbaaa 
azparlaaata  waa  5NBz  or  laaa.  Tba  light  dlffraotad  by  tba  AOO  waa  Fourier  tranafornad  la 
tba  borlaoatal  dlraotlon  and  expanded  vartloally  bo  lllualnata  a  ona  dlaanalonal  Fourier 
traaafora  bolograa  of  tba  rafaranoa  laaga.  Tba  light  dlffraotad  by  tba  bolograa  waa  Inagad 
vartloally  and  tranafornad  horizontally  to  produoa  tba  aultlpla  one  dlaanalonal  oorrolatlona 
at  tba  CCO  plana. 

Tba  bolograa  waa  fabrloatad  la  dlohroaatad  gaXatlavyloldlng  afflolonoy  355  at  tba  820aa 
wavalaagtb.  Tba  lataaaity  of  tba  ll^t  waa  dataotad  at  tba  CCO  plana.  Tba  horizontal  olook 
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from  tb«  CCD  la  uaad  to  trlgsar  tha  tfrtvlag  alaotroaioa  of  tha  Input  TV  eaaura  aad  to 
traaafar  tha  oharga  in  tha  array  dawnwarda  by  ana  plxal.  Tha  CCD  darloa  uaad  In  tha 
axparlaaata  una  a  SORT  XC-3T  uhoaa  driving  aXaatroaiaa  vara  aodlflad  to  allow  ua  to  aeroll 
tha  aharga  on  tha  CCD  aoatlauoualy  during  aaah  fraaa.  Thia  CCD  oanara  baa  38t  pixala  in  tha 
horiaaatal  diraatlaa  and  491  pisala  in  tha  aaralling  diraotlon.  Tha  aorolllng  aotion  of  tha 
CCD  aoaplataa  tha  g-D  aarralatian  na  daaarihad  anrliar  and  tha  full  2-D  oorralatlana  botwean 
tha  langa  to  whiah  tha  TV  aaaara  la  poiatad  and  tha  rafaraaoa  iaaga  atorad  on  tha  Fouriar 
traaafora  holegraa  ia  praduaad  at  30  frnaaa  par  aaaond  and  dlaplayad  on  a  aonltor. 

i  photograph  of  tha  axpariaaatal  npparatua  ia  ahown  In  Figuro  3  ahowlag  that  tha  ayataa 
la  ralativaly  alapla  and  aoapaat.  Tha  laaar  dloda  ia  on  tha  right  and  tha  CCD  on  tha  loft 
aida  of  tha  photograph  ia  Figuro  3. 


Flgura  2.  Bolographio  aeouatooptio  inaga  oorralntor. 


Figuro  3.  Sxparlaaatal  aatup  of  tha  ayataa  in  Figuro  2. 


An  axaapla  of  tha  axpariaanta  that  wara  dona  with  thia  ayatan  la  ahown  In  Flgura  4.  The 

Flgura  4  wara  uaad  aa  Inputa.  On  tha  laft  hand  aida  of  tha 
P  otograph  tha  rooonatruotlona  of  tha  two  faoaa  ara  dlaplayad.  Thaaa  raoonatruotlona  were 
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obtainad  by  pointlbs  TV  oaaera  to  an  lapulaa-llka  funotion  and  tbus  producing  at  tba 
output  tba  lapulsa  rasponsa  of  tba  ayataa.  It  la  avldant  that  tba  lapulaa  raaponaa  of  tba 
ayataa  la  an  adga  anhanoed  varalon  of  tba  original  In  order  to  auppraaa  tba 
eroaaoorralatlooa.  Thla  waa  aoeoapllahad  by  fabricating  tba  bologras  ao  that  blgbar 
fraquancy  coaiponanta  wara  anhancad.  All  four  poaalbla  oorralatlona  and  tbalr  croaa  aaotlona 
ara  aboun  In  Flgura  %  in  two  dlaanalonal  dlaplay.  Tba  raaulta  ahow  anoallant  dlaorlalnatloo 
batwaan  tba  two  faoaa. 

Tba  photograph  ahown  In  Flgura  5  la  an  laoaatrlo  claw  of  tba  autooorralatlon  of  ona  of 
tba  pattarna  In  Flgura  b.  k  vary  aharp  oorralatlon  paak  la  obtalnad  which  la  algnlflcant 
baoauaa  tba  oorralatlon  producad  with  tbla  ayataa  la  partially  Inoobarant,  In  a  fully 
Inooharant  corralator  that  oparataa  on  light  Intanalty  [12],  tba  corralatlon  paak  la 
typically  broad  and  raata  on  top  of  a  blaa  plataau.  In  tba  ayataa  wa  ara  daacrlblng  In 
tbla  aaotlooi  tba  oorralatlon  in  tba  borsontal  dlraotlon  la  parforaad  by  oobarant  aaplltuda 
Intagration,  wharaaa  In  tba  vartlcal  dlraotlon  It  la  dona  by  Intagratlng  Intanalty.  Tbla 
Inooharant  intagration  avaragaa  out  tba  oobarant  nolaa  affaota  (l.a.  apaokla  and  ralatad 
pbanoBana)  and  yat,  tba  oorralatlon  pattarn  that  la  produced  la  blaa  fraa  and  Ita  abarpnaaa 
In  both  dlaanalona  la  oharaotarlatlo  of  oobarant  oorralatora. 


Flgura  b.  Sxparlaantal  daaonatratlon  of  Flgura  3.  laoBotrlo  vlaw  of  tba  auto- 

tba  bolograpblo  aoouatooptlo  oorralatlon  of  ona  of  tba 

oorralator.  pattarna  In  Flgura  b. 


Tba  raaulta  that  wa  obtained  fron  the  ayataa  daaorlbad  above  have  oonvlnoad  ua  that 
aoouatooptlca  oan  eaaantlally  aolva  tba  davloa  llaltatlona  of  optioal  laaga  oorralatora,  at 
laaat  aa  far  aa  the  real  tiaa  Input  ataga  la  oonoarnad.  If  wa  aak  what  ara  tba  liaitationa 
of  tbla  ayataa  la  taraa  of  parforalng  adequately  la  a  pattern  raoognltloa  applloatlon,  tba 
aaawar  la  olaarly  not  that  tba  optioal  ayataa  doaa  not  parfora  oorralatlona  wall  enough.  Tba 
real  laaua  now  la  how  to  uaa  an  optioal  oorralator  to  raoognlsa  laagaa.  Wa  have  ooaa  to  tba 
oonolualoB  that  a  digitally  prograaaabla  rafaranoa  la  a  key  feature  that  naada  to  ba 
Inoorporatad  In  optioal  oorralatora  In  order  to  aaka  tbalr  applicability  to  pattarn 
raoognltlon  praotloal. 

Tba  aoat  obvloua  way  to  Introduoa  a  prograaaabla  rafaranoa  to  tba  ayataa  la  Flgura  2  la 
by  raoordlng  tba  Fourier  tranafora  bolograa  on  a  real  tlaa  SLM  ratbar  than  on  pbotograpblo 
flla.  Wa  have  Invaatlgatod  tba  uaa  of  tba  Litton  aagnatooptlo  apatlal  light  aodulator  C13] 
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A  pkotocrspb  ttf  thtt  •xp«rla«Btal  BppBratus  is  ahowB  1b  Plsur*  6.  Tb*  LEO  uBBd  la  th« 
•xparlaBBt  la  fabrloatad  bp  Heaapwall  aad  it  oeaalata  of  180  alaaoBta,  aaob  aaparatad  bp  100 
Bleroaa.  Eaob  alaaoBt  traaaatta  5  aieroaatta  of  optloal  powar.  Ta  tbo  oaporlaoat  wo  will 
doaorlbo,  oalp  32  oloaoata  of  tbo  arrap  woro  uaod.  Tbo  olootroaio  aoaorp  that  la  aoodod  for 
atorlBS  tbo  roforoaoo  laasa  waa  built  wltb  ataadard  lAM  oblpa  aad  It  waa  doalgaod  to  atoro 
blaarp  laafoa  ooaalatlac  of  32X100  plaolo.  Tbo  aoaorp  waa  latorfaeod  to  tbo  optical  apatoaa 
tbrousb  aa  arrap  of  32  LSD  drlvora  aad  to  alao  to  an  IBM  poraonal  eoaputor  wblob  waa  uaod  to 
goaorato  tbo  roforoaoo  pattaraa.  Tbo  AOO  aad  tbo  CCD  woro  tbo  aaao  dowlooa  that  woro  uaod  in 
tbo  oxporlBonta  doaerlbod  oar 1 lor. 


Plguro  7,  Inooboront  LEO  aoouatooptic  correlator. 


Plguro  6.  Exporlaontal  aotup  of  tbo  ayatoa  In  Figure  7. 


A  aaaplo  of  tbo  exporlaontal  rosulta  obtained  with  this  ayatoa  are  ahown  In  Figure  9.  Tbo 
input  pattern  (Fig.  9a)  oontaiaa  the  word  "LEO*  In  two  plaooa.  Tbo  refaronoa  laaga  cboaan  in 
tbia  oaao  to  bo  tbo  word  LEO  and  abown  In  Figure  9b  la  tbo  rofaronco  laago  aa  dlaplayed  on 
tbo  aoroon  of  tbo  eoaputor.  Tbo  output  of  tbo  optloal  oorralator  la  abown  in  Flguroa  9c  and 
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as  a  spatial  flltar  in  optieal  corrslators  [14].  Tbs  aagnstooptlo  dsvlos  (MOD)  ws  ara  using 
Is  structursd  as  a  two  dlasnslonal  array  of  128X128  plxsls.  Each  pixsl  can  bo  slsetrloally 
sat  to  ona  of  two  possibls  aagnstlsatlon  statss.  Hsnoe  this  Is  a  binary  spatial  light 
■odulstor.  A  fsaturs  of  the  devloe  that  la  aoat  Intersating  la  tsras  of  recording  holograas 
is  the  fact  that  It  la  a  bipolar  light  aodulator  [14],  This  laplles  that  the  average 
aaplitude  traasalttanee  of  the  bolograa  oan  be  aade  sero  whloh  allows  an  on-axla  Fourier 
tranafora  holograa  to  be  recorded  and  read>out.  A  very  slaple  slgorltba  for  recording 
coaputer  generated  Fourier  transfora  holograas  on  this  device  was  desorlbed  and  deaonstrated 
In  reference  [14].  The  real  part  of  the  Fourier  transfora  of  the  reference  laage  Is  ezaalned 
for  each  pixel.  If  It  Is  positive,  the  corresponding  pixel  of  the  MOD  Is  set  to  one  of  Its 
two  states,  and  If  It  Is  negative.  It  Is  set  In  the  opposite  state.  This  algorltba  has 
yielded  excellent  results  In  a  conventional  two  dlaenslonal  optical  correlator.  It  can  be 
aodlfled  for  the  acoustooptlo  correlator  In  Figure  2  by  alaply  calculating  tbs  one 
dlaenslonal  Fourier  tranfora  of  the  reference  laage  with  a  digital  coaputer,  then  recording 
on  the  MOD  the  sign  of  the  real  part  of  this  transforasd  laage.  A  prellalnary  experlaental 
result  with  a  systea  alallar  to  the  one  In  Figure  2,  with  an  MOD  replacing  the  holograa,  was 
obtalnad.  The  results  are  shown  In  Figure  6.  The  Input  pattern  tbet  was  laaged  onto  the  TV 
caaera  was  the  letter  X  shown  In  Figure  6a.  The  one  dlaenslonal  Fourier  transfora  holograa 
of  the  letter  X,  ooaputed  and  then  recorded  on  the  MOD,  Is  shown  in  Figure  6b.  The 
autocorrelation  of  the  letter  X  that  was  produced  In  real  tlae  as  a  video  signal  by  the  CCD 
and  displayed  on  a  aonltor.  Is  shown  in  Figure  6o. 


Figure  6.  Experlaental  deaonatratlon  of  the  holographic  acoustooptlo  correlator  with  a 
prograaaable  aagnetooptlo device,  (a) Input. (b) Bolograa. ( c) Corro 1 atlon 
output. 


tncoherent  Um  aoouatoontln  BgCffllatflf 

In  this  section  we  present  results  froa  an  experlaental  deaonatratlon  of  the  Incoherent 
correlator  that  Is  described  In  detail  In  reference  [11].  A  soheaatlc  dlagraa  of  this 
processor  Is  shown  In  Figure  ?•  The  baalo  architecture  follows  froa  Figure  1.  The  difference 
between  this  and  the  systea  discussed  In  the  previous  section  la  the  choice  of  the 
aultlobannel  one  dlaenslonal  correlator.  The  systea  in  Figure  ^  utilises  an  array  of 
Incoherent  tlae  integrating  correlators  rather  than  coherent  space  integrating  correlators. 
The  teaporal  signal  aodulatlng  each  of  the  LEDs  la  Figure  T  la  correlated  against  the  signal 
that  Is  launched  Into  the  acoustooptlo  device.  These  correlations  are  foraed  on  separate 
lines  of  the  CCD  at  the  output  of  the  systea.  The  reference  laage  Is  stored  In  electronic 
aeaory  which  oan  be  read-out  In  parallel  such  that  each  line  of  the  reference  laage  can 
teaporally  aodulate  a  separata  LED.  The  electronic  aeaory  Is  triggered  to  resd-out  its 
contents  la  ayncbronlsa  with  the  borlsontal  clock  froa  the  CCD.  As  each  new  Input  video  line 
Is  entered  In  the  AOO,  it  Is  correlated  against  all  the  lines  of  the  reference  laage.  As  la 
the  previous  architecture,  the  CCD  is  triggered  to  scroll  la  syaobronlsa  with  the  borlsontal 
sync  of  the  Input  TV  caaera,  whloh  at  the  and  of  each  video  fraae,  results  la  the  foraatlon 
of  the  full  two  dlaenslonal  correlation  between  the  Input  laage  and  the  reference  laage 
atored  In  the  electronic  nenory. 
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9d.  Th«a«  plcturaa  of  tbo  oobo  thing  with  tbo  oontraot  of  tbo  dlsplng  nonltor  adjuotod 
to  dloplnr  nil  tbo  croon  oorrolntlono  in  figure  90|  uboronn  in  Figuro  9d  only  tbo 
nutooorrolntion  ponkn  nro  viniblo.  Tbo  oronnoorrol ntionn  la  thin  onoo  nro  blgbor,  vban 
eoapnrod  uitb  tbo  bologrnpbio  oorrnlntor.  This  in  bnonuno  la  na  iaoobornat  oorralator,  only 
ponitlTO  vnlunn  ean  bn  dlraotly  roprnnnatod.  Vo  nro  ourrontly  working  on  laoorporntlng  la 
tbln  oyntoB  the  onpnblllty  to  roproooat  bipolar  langoa.  Tbln  poraita  aueb  grantor 
riazlllllty  la  obooalng  tbo  roforoaoo  lango  wblob  la  turn  poraita  tbo  aolootloa  of  too 
raferoaoo  lango  to  alnlalno  tbo  oronaeorrolntlona  and  obtain  porforannoo  ooapnmblo  with  tbo 
eoboroat  oorrolntor. 


C  IT 
LED 
AOC 
LED 
XYZ 


<  b  c  d 

Figuro  9.  gnporiaantnl  doaonatratlon  of  inoohoront  LED  oorrolntor.  (a)  Input, 
(b)  Bofor'»aoo  lango.  (a),(d)  Corrolatlon  output. 


IflkaairltdgaatBt 

Vo  noknowlodgo  Litton  Data  Syatoaa  for  tbolr  aupport  and  tbo  loan  of  tbo  aagnotooptlo 
dorleo  that  waa  uaod  la  tbo  ozporlaonta.  Vo  alao  noknowlodgo  Mr.  Ootob  Oolob  at  tbo  Sony 
Corporation  for  hla  bolp  with  tbo  CCD  uaod  la  tbo  ozporlaonta. 

Tbo  work  roportod  In  tbla  paper  la  fuadad  by  granta  froa  tbo  gray  Eoaoarob  Offloo»  tbo 
Air  Office  of  Seloatlflo  Eoaoarob  and  Oonoral  Dynaaloa. 
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ABSTRACT 

Tho  capacitjr  of  the  Vandw  Lugt  correlator,  defined  as  the  maximum  number  of  separate  images  that  can  be 
recognised,  is  estimated.  The  increase  in  capacity  that  results  from  the  use  of  a  volume  hologram  in  place  of  the 
commonly  used  planar  hologram  is  derived.  The  effects  binarising  the  reference  filter  and  the  shift  invariant 
properties  of  the  two  classifying  systems  are  also  analysed. 


L  Introduction 

Vender  Lugt  correlators  have  been  used  for  a  long  time  in  optical  pattern  recognitionjlj.  In  the  typical 
implementation,  shown  in  figure  1,  the  Fourier  trasform  of  an  input  image  is  used  to  read  out  a  homgram  containing 
the  Fourier  transform  of  a  reference  image.  This  diffracted  beam  is  then  inverse  Fourier  transformed  to  produce 
the  correlation  between  the  input  and  reference  images  on  the  output  plane.  Vander  Lugt  correlators  are  typically 
used  as  pattern  recognition  systems.  Whether  or  not  a  peak  is  present  at  the  output  of  the  correlator  determines 
whether  or  not  the  input  image  is  sufficiently  close  to  the  stored  reference  .  Recently  much  work  has  been  done  on 
the  use  of  a  Vander  Lugt  correlator  to  pattern  classifiation  |2]j3).  In  this  case,  the  correlator  distinguishes  whether 
the  input  is  a  member  of  one  of  two  classes  with  each  class  being  composed  of  many  images.  Typically,  a  reference 
filter  is  formed  as  a  linear  combination  of  the  images  in  both  classes  and  the  presence  or  absence  of  a  peak  at  the 
correlation  plane  determines  which  class  the  input  belongs  to. 


In  this  pap<T,  wc  discuss  the  capacity  of  the  Vander  Lugt  correlator.  This  is  to  say  we  estimate  the  maxiiiiuiii 
n-..,.ibcr  of  iiiMgt-  il.ui  can  be  stored  in  the  reference  filter  before  the  system  begins  to  misclassify  images.  This 
capacity  has  been  studied  in  great  detail  for  systems  without  shift  invariance  (e.g.perceptronsl.  The  classic  results 
from  palleni  recognition  about  the  capacity  of  a  linear  discriminant  function  do  not  directly  apply  in  this  case 
because  the  VaiiHerLiigt  correlator  is  shift  invariant.  In  this  paper,  we  will  discuss  the  capacity  of  the  system 
incorporating  the  shift  in  -  aii.ince  of  the  Vander  Lugt  correlator.  We  will  also  discuss  the  effect  on  the  cap-irity  of 
binarising  the  reference  filter  and  lastly  we  will  demonstrate  that  by  using  a  volume  holgram  to  record  the  filler, 
the  capacity  of  the  system  is  greatly  increased,  as  well  as  be  becoming  capable  of  multi-class  classification. 

Il.Cauacity  of  Linear  Filtera 


In  the  most  common  pattern  classification  scheme,  the  inner  product  is  performed  between  the  inpnl  im.ige 
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^(x,tr)  compoMd  of  N  pixels  uid  »  reference  filter  h(x,y). 

111 

^■1  S^ail 

Compering  the  output  O  with  n  pre>set  threshold,  determines  which  of  the  two  classes  the  input  belonged  to.  A 
standard  method  of  forming  the  Terence  filter  is  as  a  linear  combination  of  the  images  in  both  classes: 

M*.y)  “l^"**^*  1*1 


The  weights  tv,  can  be  chosen  through  a  variety  of  training  algorithms  such  as  the  perceptron  learning  algorithm. 
It  is  a  well  known  result  that  the  capacity  of  such  a  system  is  [4] 

M*2N  (3] 


where  N  is  the  number  of  pixels  in  each 
in  which  the  weights  are  binary. 


image.  In  this  paper,  we  will  consider  the  construction  of  a  simpler  filter 


J  1  if  €  Class  I 
\0  if  d<e  Class  II 
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In  other  words,  the  filter  is  formed  by  simply  summing  the  images  belonging  to  clast  1,  while  ignoring  those  in 
class  2.  This  is  implemented  in  a  Vander  Lugt  correlator,  by  multiply  exposing  the  holgram  to  the  images  in  class 
1  while  doing  nothing  for  the  images  in  class  2.  Classification  can  then  be  performed  by  detecting  and  thresholding 
the  output  at  the  center  of  the  correlation  plane.  For  the  remainder  of  the  paper,  we  will  assume  that  the  images 
d(z,  y)  consist  of  binary  N  pixels,  each  pixel  being  a  bipolar  (ie  1  or  -1),  independent  random  variable.  Under 
these  assumptions,  the  capacity  of  the  VanderLugt  correlator  using  the  peak-only  detection  scheme  can  by  found 
be  solving  the  following  transcendental  equation{5|: 


^  "  4log{M^/N) 

At  N  -*  oo,  the  above  expression  asymptotically  approaches 
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Thus  the  use  of  the  simpler  method  for  constructing  the  reference  filter,  results  in  a  relatively  modest  loss  in 
capacity  by  a  factor  of  16logN 


ni.  Capacity  of  Shift  Invariant  nitera 

Because  Vander  Lugt  correlators  are  inherently  shift  invariant  it  is  possible  to  classify  prescribed  images  and 
their  shifted  versions  as  well.  In  order  to  implement  a  shift  invariant  classification  scheme,  detection  at  the  output  is 
done  over  the  entire  correlation  plane.  As  a  result  the  detection  of  a  peak  anywhere  in  the  output  plane  determines 
whether  the  input  is  a  member  of  class  1  or  a  shifted  form  of  a  member  in  class  1.  Figure  2a  shows  a  cross  section 
through  the  origin  of  the  digital  correlation  of  an  input  image  with  a  filter  containing  only  one  image.  The  resulting 
output  shows  a  single  correlation  peak  and  relatively  small  sidelobes.  When  the  reference  is  constructed  bv  adding 
3  images  (figure  2b,  the  sidelobe  structures  shows  a  significant  rise  in  amplitude.  However,  since  only  the  single 
correlation  peak  lies  above  the  threshold,  classification  of  the  input  image  i^  still  performed  correctly.  However, 
when  the  number  of  reference  images  is  increase  to  6  (Fig. 2c),  there  are  now  twt  peaks  which  lie  above  the  tlireshoUl 
level.  As  a  result,  the  system  can  no  longer  decide  whether  the  input  image  is  a  member  of  class  1  or  a  shifted 
version  of  a  member  of  cla.'».s  1.  Therefore,  we  expect  that  the  capacity  of  the  s!;ift  invariant  system  is  smaller.  For 
the  relatively  simple  method  of  filter  construction  ,  we  can  readily  derive  an  an  .i/tic  capacity  for  the  shift  invariant 
correlator. 


Pig.  2.  Digital  comlatioiu  of  a  shift  invariant  filter 


In  the  shift  invariant  case,  the  Vander  Lugt  system  performs  a  correlation  between  one  of  the  input  image 
d(x,  y)  and  the  reference  filter  h(x,  y) 

0(*. »)  -  E  E  M**.  +  X.  v*  +  y).  (71 

•'ala'si 

For  the  case  where  the  filter  is  constructed  by  simply  summing  the  images  in  class  1  (multiple  exposure)  and 
assuming  the  same  input  statistics  for  each  image,  the  capacity  of  the  shift  invariant  Vander  Lugt  system  is  given 
by  the  solution  of  the  following  transcendental  equation  (Sj 


4tog{M^N) 

Asymptotically,  the  capacity  approaches 

WJfoyA' 

Thus,  the  capacity  is  decreased  by  only  a  factor  of  two  from  that  of  the  non  shift  invariant  system. 
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TUs  molt  u  importut  sinct  then  it  to  oor  knowMcc  no  prior  oxtimotc  for  the  kiM  in  c^ocity  due  to  shift 
wvwioBce.  For  the  cue  considered  here  (the  filter  derived  u  n  simple  sum),  the  lou  is  very  smsU;  s  futor  of  2. 
To  vertfy  the  theoretics!  cspuity  of  the  correlstor,  100  computer  triab  were  sversged  to  determine  the  capacity 
for  variou  N  .  For  each  triu,  two  random  vectors  were  generated  to  form  the  initial  reference  filter.  Each  image 
wu  correlated  to  determine  whether  clauification  wu  performed  correctly.  If  no  error  occnred,  a  new  random 
image  wu  added  to  the  reference  filter  and  correlation  with  all  the  image  wu  done.  The  number  of  images  in  the 
reference  wu  increased  until  a  misclusification  occured.  At  this  point,  the  capacity  wu  said  to  be  one  leu  than 
the  number  of  images  stored  in  the  reference. 

Figure  3  shows  the  capacity  of  both  the  peak  only  and  shift  invariant  systems  u  a  function  of  the  number  of 
pixeb,  N,  in  the  image.  Experimental  simulatiou  show  good  agreement  with  theoretical  predictions.  It  is  important 
to  note  that  because  the  simulations  were  performed  in  the  regime  of  small  N,  the  transcendental  equations  for  the 
capacity  (eqs  |5]  and  [8])  were  used  to  plot  the  theoretical  curvu. 

rV.CnpncHv  of  Binary  Filters 

As  demonstrated  above,  the  capacity  of  the  VanderLugt  system  can  be  very  large.  One  potential  limitation 
that  might  prevent  us  from  actually  implementing  such  a  large  pattern  classification  system  is  the  accuracy  with 
which  the  hologram  can  record  the  reference.  To  get  a  feel  for  the  susceptibility  of  the  system  to  nonlinearities  and 
inaccuracies,  we  considered  the  capacity  of  the  Vander  Lugt  correlator  when  the  reference  filter  hu  been  binarized. 


In  this  case,  the  reference  filter  consists  of  a  thruholded  version  of  the  filter  generated  from  the  multiple 
exposure  algorithm 

u 

A(*. »)  =  y)]  (lol 
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Again,  assuming  that  the  input  pixels  consists  of  bipolar  independent  random  variables,  we  find  that  the  capacity 
of  the  binary  Vander  Lugt  correlator  is  asymptoticauly 
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There  is  only  a  further  ir/2  reduction  in  capacity  from  that  of  the  non  binarised  shift  invariant  filter. 
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Linear  (N»256.  M«3)  Binary  (N»256.  M»3) 


(*) 


(b.) 


Fig.  4.  Digital  correlations  of  the  linear  and  a  binary  filters 

In  figure  (4|,  a  comparison  is  made  between  the  linear  and  a  binary  fillers.  In  both  cues,  the  input  iiii.'igos 
had  25G  pixeU  and  the  reference  filter  contained  3  images.  As  seen  from  the  figure,  the  sidclobe  level  of  the  binnry 
correlator  is  significantly  larger  than  that  for  the  linear  filler.  As  a  result,  as  additional  iiiinge.s  are  ad  Jed  to  ilu' 
refeiciice  filter,  the  binary  correlator  will  begin  to  misclassify  sooner.  This  will  corresponding!)  lead  to  a  lo»vi 
capacity  (theoretical  and  experimental)  for  the  binary  correlator.  In  figure  5,  the  cap.icity  of  both  the  bin.'iiired 
and  nonbinarized  filters  are  plotted  as  a  function  of  the  number  of  pixels  in  the  image.  Again,  computer  simulation' 
demon.<tratc  a  good  agreement  with  theoretical  predictions. 
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Fig.  5.  Capacity  curvet  of  a  shift  invariant  and  a  binary  filters. 

V.CnpacUv  of  the  Volume  VanderLuet  Correlator 

In  this  section,  we  consider  the  use  of  a  volume  hologram  to  record  the  reference  filters  in  a  Vender  Lugt 
correlator|6{.  We  expect  that  because  information  it  recorded  in  three  dimensions  as  opposed  to  the  two  dimensions 
for  plane  holograms,  the  storage  capacity  of  the  volume  VanderLugt  correlator  is  increased. 


Let  us  first  consider  how  a  volume  Vender  Lugt  correlator  operates  (Fig.6).  Consider  the  correlation  between 
two^int  sources.  In  the  recording  stage  (Fig.6a],  the  point  source  generates  a  plane  wave  which  interferes  with 
a  reference  wave  to  form  a  grating  which  is  recoroed  in  the  volume  hologram.  When  an  input  point  source  at  the 
same  position  is  presented  to  the  correlator  (Fig.fib),  a  new  plane  wave  reads  out  the  stored  grating.  The  diffracted 
plane  wave  is  then  focussed  to  form  the  expected  correlation  peak  at  the  output.  If,  however,  the  input  point  source 
is  shifted  in  the  direction  parallel  to  the  plane  of  incidence jFig.fic),  the  plane  wave  that  is  generated  will  not  be 
Bragg  matched  with  the  grating  in  the  volume  hologram.  Consequently  no  diffracted  wave  will  be  produced  and 
no  correlation  spot  will  be  formed.  In  the  direction  perpendicular  to  the  plane  of  incidence,  the  volume  hologram 
exhibits  very  little  Bragg  sensitivity  and  a  correlation  can  still  be  read  out.  As  a  result,  shifts  of  the  input  in  a 
direction  parallel  to  plane  of  incidence  will  not  be  recognised,  while  in  the  perpendicular  direction  the  correlator 
remains  shift  invariant. 

For  an' arbitrary  input, i4(z,  y)  and  reference  image,  j?(x,  y),  it  can  be  shown  that  the  output  of  the  volume 
Vander  Lugt  correlator  is  [7] 

0{*,  y)  =  [A(*,  y)  •  R{x,  y)] stnc(Qi)  |12l 

where  a  =  T$ini/2XF  and  *  is  the  correlation  operator.  T  is  the  thickness  of  the  hologram,  9  is  the  Ri  agg  angle, 
and  F  is  the  focal  length  of  the  inverse  Fourier  transform  lens.  In  other  words,  the  output  of  the  correlator  consi.^ts 
of  the  correlation  between  the  input  and  reference  apodised  by  a  sine  function  whose  width  is  determined  by  the 
thickness  of  the  volume  hologram. 
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Fig.  6.  Recording  end  readout  of  a  volume  hologram. 

To  experimentally  demonstrate  this  apodising  effect,  the  auto  correlation  of  an  O  was  perfoniied  using  the 
volume  Vander  Lugt  correlator.  Figure  7  shows  a  digitally  generated  autocorrelation  of  an  O  which  simulates  a 
standard  Vander  Lugt  correlator  with  a  reference  O  recorded  on  a  plane  hologram.  In  the  volume  Vander  Lugt 
correlator,  the  reference  O  was  recorded  on  a  lithium  niobate  crystal,  measuring  25x25x5mm.  The  reference  beam 
was  situated  such  that  the  plane  of  incidence  was  in  the  horisontal  direction.  Figure  8a  shows  the  output  of  the 
volume  Vander  Lugt  correlator  when  the  input  O  is  positioned  at  the  same  plane  as  the  reference  O.  The  output 
consists  of  the  standard  correlation  of  the  two  O's  multiplied  by  the  horizontal  sine  function.  When  the  input  O  is 
shifted  in  the  direction  parallel  to  plane  of  incidence  (Fig.  8b),  the  correlation  shifts  and  only  correlation  structure 
to'one  side  of  the  peak  is  present^  at  the  output.  The  smaller  spot  lying  to  the  right  of  the  primary  horizontal 
band  corresponds  to  the  very  strong  correlation  peak  lying  in  the  6rst  sidelobe  of  the  apodizing  sine  function. 
Further  shifts  of  the  input  as  shown  in  6gure  8c,  merely  reads  out  the  correlation  structure  further  from  the  peak. 


Fig.  7.  Digital  autocorrelation  of  an  O. 
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Fig.  8.  Expcruneatal  oatpati  of  the  volume  Vender  Lngt  correlator. 

The  Bragg  selectivity  in  the  volume  VanderLugt  correlator  allows  one  to  perform  multi>claas  c^egorisation  of 
the  input  images[8|.  In  the  recording  stage  (Fig.9a),  a  set  or  reference  filters  is  recorded  by  interfering  each  with  a 
reference  beam  separated  by  the  angular  bandwidth  of  the  volume  hologram.  When^  an  input  image  is  presented 
to  the  volume  Vender  Lugt  correlator  (Fig.9b),  a  set  correlations  u  performed  simultaneously  and  presented 
spatially  distributed  at  the  output.  The  Bragg  selectivity  of  the  hologram  guarantee  that  the  correlation  bands 
will  not  interfere  with  each  other.  As  a  result,  by  detecting  which  band  the  correlation  peak  appear,  determines 
which  of  many  classes  the  input  image  belongs  to. 
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Fig.  9.  Recording  and  readout  stage  of  the  niiilti'Class  categoris.itioii  volume  Vaiider  Liigl  corivl.iitn 
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Wc  c«i  coafid«r  each  comlation  band  aa  a  aeperate  ovtpat  channel  performing  a  aimple  pattern  claasification 
taak  independent  of  the  other  channels.  By  Muming  the  same  input  statistics  for  the  images  in  each  class,  the 
capacity  ct  each  output  channel  can  be  analytically  derived.  In  this  case,  the  maximum  number  of  images  that  can 
be  stored  was  found  to  be  equal  to  that  of  the  standard  VanderLugt  correlator  |eq.5|.  Asymtotkally,  »e  capacity 
of  each  channel  approaches 

The  number  of  output  chunels,  K,  that  can  be  stored  in  the  volume  hologram  is 


TLtit^ 

XF 


{1<K<N) 
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where  L  is  the  actual  dimension  of  the  output  detector  array  in  the  direction  parallel  to  the  plane  of  incidence. 
Hence  the  total  capacity  of  the  system  is 
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Thus  the  effect  of  the  using  a  volume  hologram  is  that  the  capacity  is  increased  by  the  number  of  output  channels 
the  hologram  can  support  and  one  to  perform  multi-class  classification.  However,  one  drawback  is  the  partial  loss 
of  shift  invariance  in  one  direction  that  results  from  the  use  of  a  volume  hologram. 

Yi  CgntlwiPB 

In  conclusion,  we  have  demonstrated  that  the  capacity  of  a  Vander  Lugt  correlator  without  shift  invariance  is 
N/ilogN  for  the  simple  additive  filter.  By  incorporating  the  shift  invariance  inherent  in  an  optical  correlator,  the 
capacity  is  only  decreased  by  a  factor  2.  Furthermore,  by  binarising  the  reference  filter,  there  is  a  further  loss  by  a 
factor  of  w/2.  However,  by  utilising  a  volume  holgram  to  record  the  reference  filter,  the  capacity  of  the  correlator 
it  increased  by  a  factor  that  can  be  as  high  as  N  with  a  proportional  lots  in  shift  invariance. 
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OPTICAL  DISK  BASED  CORBELATION  ARCHITECTURES 

Oemetri  Paaltis,  Mark  A.  Neifeld  «ul  Alan  Yamamura 

Dapartmaat  of  Elaetrkal  Enginooring 
Galifomla  Lutituto  of  Tochnologjr 
Paaadona,  CA  01125 

In  this  paper  wo  doscribo  and  oacporanontally  demonatrate  optical  image  corzelaton 
that  are  implemented  uaing  optical  mommy  diaka.  Optical  correlation  for  pattern  recog¬ 
nition  [1]  haa  long  been  conaidered  a  promiaing  application  for  optical  proceaaing.  One  of 
the  reaaona  auch  correlaton  have  not  been  uaed  in  practical  applicationa  yet  haa  been  the 
lack  of  auitable  apatial  light  modulaton  to  be  uaed  aa  real  time  input  devicea.  Recently, 
thia  limitation  haa  to  a  large  extent  been  removed  through  the  development  of  a  variety  of 
2-D  SLM’a  [2]  and  concepta  that  allow  the  utilization  of  mature  1-D  (acouatooptic)  SLM’a 
[3].  Attention  haa  therefore  ahifted  to  the  deaign  of  appropriate  filters  to  perform  reliable 
recognition  [4].  In  moat  practical  applicationa  a  aingle  filter  ia  not  sufiilcient  to  produce 
reliable  recognition,  and  the  uae  of  apatial  [5]  and  temporal  [3]  multiplexing  to  aearch 
through  a  library  of  filters  emerges  as  the  most  straightforward  solution  to  the  problem. 
The  optical  disk  correlator  architectures  we  describe  in  this  paper  provide  an  extremely 
efficient  method  for  performing  thia  taak  aince  they  combine  in  a  single  device  the  huge 
memory  required  for  storage  of  the  library  of  reference  images,  the  spatial  light  modulator 
needed  to  represent  the  reference  in  the  optical  correlator,  and  the  scanning  mechanism  to 
temporally  search  through  the  library. 

The  first  architecture  we  udll  describe  is  shown  in  Fig.  1.  Each  reference  image  is 
recorded  aa  a  2*0  computer  generated  Fourier  transform  hologram  on  the  disk.  The  in¬ 
put  image  goes  through  the  beamsplitter,  it  is  Fourier  transformed  by  the  lens,  and  it 
illuminates  the  hologram  on  the  disk.  The  reflected  light  contains  a  term  proportional  to 
the  product  of  the  transforms  of  the  input  and  reference  images.  The  same  lens  retrans¬ 
forma  the  reflected  light  and  the  correlation  is  produced.  A  principal  issue  of  concern  in 
this  architecture  is  the  suitability  of  commercially  available  disk  systems  for  recording  and 
reconstruction  of  holograms.  We  have  identified  a  write-once  disk  system  which  is  manu- 
fact\ired  with  glass  (rather  than  plastic)  covers  of  sufficient  optical  quality  that  haa  allowed 
US  to  reconstruct  the  recorded  data  using  coherent  light.  We  will  report  the  results  of  thia 
experiment  at  the  conference.  The  rotation  of  the  disk  is  uaed  to  perform  a  search  through 
images  centered  at  the  aame  radial  position  on  the  disk.  An  auxiliary  scanning  mechanism 
ia  needed  in  order  to  position  the  correlator  "head*  in  the  correct  radial  position.  Aa 
the  disk  rotates  the  entire  correlation  pattern  shifts  in  one  dimension  at  the  output  as 
long  as  the  reference  hologram  remains  in  the  field  of  view.  A  time-delay-and-integrate 
(Tpl)  CCD  sensor  can  be  uaed  to  integrate  this  traveling  correlation  pattern  in  order  to 
improve  sensitivity.  Alternatively  a  1-D  parallel  read-out  detector  array  can  be  used  that 
sequentially  produces  slices  of  the  2-D  correlation  pattern  as  it  travels  past  the  detector 
array. 

A  straightforward  modification  of  the  system  of  Fig.l  is  obtained  by  recording  holo¬ 
grams  that  are  Fourier  transforms  of  the  reference  images  only  in  the  radial  dimension 
since  the  rotation  of  the  disks  provides  the  necessary  shift  between  the  input  and  reference 
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along  tha  tracks.  The  light  reflactad  from  such  a  hologram  is  Fourier  transformed  in  the 
radial  direction  and  integrated  in  the  orthogonal  dimension  onto  a  1-D  parallel  read-out 
array.  The  signal  from  the  detector  array  is  again  the  2-D  correlation  presented  as  a  se¬ 
quence  of  1-D  slices.  The  advantage  of  this  architecture  compared  to  the  previous  one  is 
that  it  has  the  same  light  efficiency  as  the  TDI  system  without  the  relative  complication 
of  the  TDI  sensor.  Therefore  the  experiments  we  will  present  are  with  this  type  of  system. 

The  above  architectures  require  storage  of  the  reference  images  in  the  form  of  computer 
generated  Fourier  transform  holograms.  This  provides  the  advantage  of  shift  invariance 
which  means  that  we  do  not  need  to  be  concerned  with  accurate  positioning  within  a  single 
track  of  the  correlation  head  with  respect  to  the  data  recorded  on  the  disk.  This  is  a  very 
important  practical  consideration;  the  disadvantage  however  is  an  increase  by  a  factor  of 
100  or  more  in  the  space  bandwidth  product  required  to  record  the  hologram  compared  to 
the  space  bandwidth  product  of  the  im^e  itself  and  an  increased  computational  overhead 
to  record  the  disk.  In  addition,  the  smaller  sixe  of  the  recording  results  in  reduced  phase 
uniformity  requirement  for  the  disk.  In  many  cases  it  is  only  necessary  to  record  the 
reference  images  as  binary  patterns  [6]  in  which  case  they  can  be  directly  recorded  on  the 
disks.  Gray  scale  images  can  be  recorded  using  some  form  of  area  modulation  as  is  done 
with  video  disks  for  example. 

There  are  two  types  of  arclutecture  we  will  discubs  that  allow  the  reference  im^es 
themselves  to  be  stored  on  the  disk  rather  than  their  Fourier  transforms.  The  first  is  shown 
in  Fig.  2.  The  input  image  goes  through  the  beamsplitter  and  it  is  Fourier  transformed 
by  lens  Li.  A  Fourier  transform  hologram  of  the  input  is  recorded  in  a  photorefractive 
crystal  using  a  reference  beam  that  is  incident  from  the  right,  as  shown  in  the  figure.  Once 
the  hologram  is  recorded  the  input  is  blocked  and  the  the  disk  b  illuminated.  Li  takes 
the  Fourier  transform  of  the  reference  image  that  b  in  the  field  of  view  of  the  illuminating 
beam  and  L3  transforms  the  light  diffracted  by  the  holo;'ram  to  produce  the  correlation  at 
the  output  plane.  The  rotation  of  the  disk  b  used  to  seurch  through  a  library  of  images  in 
the  radial  direction  and  a  TDI  detector  can  be  used  at  t!ie  output  to  increase  sensitivity  as 
before.  Multipb  holograms  could  be  multiplexed  in  the  crystal  to  address  different  radial 
positions  on  the  disk  or  the  entire  head  can-be  scanned  to  address  different  radial  positions 
as  before.  We  have  not  yet  completed  the  experimental  demonstration  of  thb  system  but 
we  expect  that  at  the  conference  we  will  present  the  experimental  results  from  thb  system. 

The  final  architecture  we  will  discuss  b  shown  in  Fig.  3.  The  advantage  of  thb 
architecture  b  that  it  operates  on  the  light  intensity  and  consequently  the  requirement  for 
phase  uniformity  b  greatly  relaxed.  As  a  result  it  b  possible  to  implement  thb  architecture 
with  most  exbting  dbk  systems.  Thb  correlator  works  as  follows.  The  reference  images 
are  recorded  on  the  dbk  and  the  input  b  imaged  through  a  1-D  scanning  device  onto 
the  dbk.  The  scanner  can  be  either  acoustooptic  (as  shown  in  Fig.  3)  or  a  rotating 
mirror.  It  provides  the  relative  dbplacement  in  the  radial  dbection  between  the  input 
and  reference  images  that  b  necessary  to  calculate  the  correlation  function.  The  dbk 
rotation  provides  the  displacement  in  the  orthogonal  dbection.  The  scanner  translates 
the  input  image  completely  accross  the  stored  reference  image  each  time  the  dbk  rotates 
by  a  dbtance  equal  to  a  pixel  of  the  reference.  The  intensity  of  the  light  reflected  from 
the  dbk  at  any  one  tinoe  b  proportional  to  the  product  between  the  input  and  a  shifted 
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venkm  of  the  reference.  The  reflected  light  is  collected  (integrated)  on  a  aingle  detector 
which  produces  as  its  output  a  temporal  video  signal  of  the  2-D  correlation.  This  system 
was  experimentally  demonstrated  with  acoustooptic  scanners.  Two  types  of  acoustooptic 
scanners  can  be  used:  A  *flying  spot*  scanner  in  which  a  chirp  signal  propagates  in  the 
acoustoc^tic  device  acting  as  a  traveling  lens  that  scans  the  diffracted  image  at  a  rate  equal 
to  the  acoustic  velocity.  This  system  completes  a  scan  in  a  few  /m,  therefore  a  complete  2-D 
correlation  takes  ^prmcimately  a  few  ins.  The  second  scanner  that  we  have  demonstrated 
is  a  more  conventional  acoustooptic  deflector  that  scans  slowly  but  permits  a  higher  space* 
bandwidth  product  of  the  input  image.  A  sample  of  experimental  results  obtwned  with 
the  system  of  Fig.3  b  shown  in  Fig.4.  Fig.  4a  b  a  photograph  of  the  pattern  recorded 
on  a  write-once  disk  (the  acronym  CIT)  and  Fig.  4b  b  the  2-D  correlation  produced  by 
the  optical  system  of  Fig.  3  and  dbplayed  by  raster  scanning  the  detector  output  on  a 
2-D  monitor.  Correlations  can  be  produced  with  our  experimental  apparatus  at  rates  up 
to  1000, 100X100  pixel  images  per  second.  The  optically  calculated  correbtion  b  in  good 
agreement  with  the  expected  autocorrelation  fimction  of  the  CIT  pattern.  It  should  be 
pointed  out  that  since  thb  system  operates  on  intensity  we  can  only  represent  positive 
quantities.  In  order  to  represent  bipolar  input  and/or  reference  images  we  need  to  add 
biases  at  the  input  stage  and  subtract  it  from  the  output  [3],  a  technique  that  has  been 
successfully  used  in  a  variety  of  incoherent  architectures. 

The  number  of  bits  that  can  be  stored  in  the  type  of  dbk  that  we  use  for  most  of 
our  work  (a  write-once,  12  cm  diameter  system  from  SONY)  b  more  than  5  billion;  The 
number  of  100  x  lOO-pixel  images  that  can  be  stored  in  such  a  dbk  b  more  than  5,000, 
assuming  a  generous  factor  of  100  for  loss  of  spacebandwidth  product  due  to  representation 
(e.g.  area  modulation  for  gray  scale  representation).  The  rate  at  which  all  these  images 
can  be  interrogated  for  a  possible  match  with  the  input  b  limited  by  one  or  more  of  the 
following  factors:  The  scanning  speed  of  the  dbk  (40Hz  in  our  case),  the  speed  of  the  radial 
scanning  mechanbm,  and  the  sensitivity  and  the  bandwidth  of  the  output  detectors  and 
the  electronics  following  them.  As  an  example  consider  the  system  of  Fig.2.  At  40  Hs  dbk 
rotation  rate,  we  obtain  1000  image  correlations  per  l/40th  of  a  second  (i.e.  40,000)  image 
correlations  per  second),  yielding  a  reasonable  4  MHz  bandwidth  per  detector.  It  would  be 
extremely  difficult  to  duplicate  thb  capability  electronically  and  it  can  be  achieved  with 
existing  optical  technology.  Moreover  it  b  precisely  such  capability  that  b  required  for 
practical  pattern  recognition  problems. 

The  research  reported  in  thb  paper  b  supported  by  the  Army  Research  Office. 
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1.  INTRODUCTION 

Several  types  of  optical  associative  memories  have  been  pro¬ 
posed  over  the  years.  Ghost  image  type  associative  memory 
was  suggested  by  van  Heerden'  and  was  investigated  by 
others,  including  Gabor. ^  in  a  ghost  image  holographic  asso¬ 
ciative  memory,  a  hologram  of  a  pattern  A  is  made  using  the 
pattern  B  as  the  reference.  If  the  hologram  is  illuminated  by 
A.  then  the  output  becomes  A  •  A  *  B.  where  •  denotes 
correlation  and  *  denotes  convolution.  If  A  is  a  noise-like 
random  phase  object,  then  the  output  can  be  well  approxi¬ 
mated  by  B.  If  the  space-bandwidth  product  (SBP)  of  the 
output  images  is  equal  to  the  SBP  of  the  hologram,  then  this 
type  of  associative  memory  can  store  only  one  pair  of  associa¬ 
tions.^  If  the  SBP  of  the  hologram  exceeds  the  SBP  of  the 
output  patterns,  then  the  number  of  associations  that  can  be 
simultaneously  stored  on  the  same  hologram  is  equal  to  the 
ratio  of  the  two  SBPs.^  Random  phase  diffusers  can  be  used  to 
improve  the  quality  of  the  reconstructed  images  by  making 
the  effective  bandwidth  of  the  pattern  A  larger,  which  makes 
the  A  *  A  closer  to  the  ideal  delta  function.  The  diffuser  also 
makes  A|  *  Aa  closer  to  zero  for  the  cross  terms  when  multi¬ 
ple  associations  are  stored  on  the  same  hologram.  Willshaw  et 
al.*  discussed  optical  memories  quite  similar  to  the  ghost  type 
and  also  suggested  using  thresholding  at  the  output  for  reduc¬ 
ing  the  cross-correlation  “noise"  when  multiple  associations 
are  stored. 

A  second  class  of  associative  memories  can  be  constructed 
as  an  array  of  holographic  correlators^^  that  compare  the 
input  and  a  bank  of  reference  patterns.  If  a  correlation  peak  is 
detected  above  threshold,  the  associated  pattern  it  produced. 

More  recently,  following  the  resurgence  of  interest  in 
neural  network  models  of  computation. *“**’  several  new 
holographic  memories  have  been  proposed. '  In  this  paper 
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we  are  concerned  with  the  holographic  memory  proposed  by 
Psaltis  and  Farhat."-'^  This  memory  can  be  thought  of  as  a 
compromise  between  the  ghost  and  correlation  peak  detection 
memories.  We  discuss  this  point  with  reference  to  Fig.  1, 
where  the  three  possible  implementations  are  shown.  In  the 
ghost  image  memory  shown  in  Fig.  1(a),  the  input  pattern  f 
is  correlated  against  all  stored  memories,  and  the  correlation 
function  is  then  convolved  with  the  associated  stored  output 
pattern.  The  final  result  is  obtained  by  summing  ail  of  the 
reconstructed  images.  The  peak  detection  memory  shown  in 
Fig.  1(b)  detects  the  presence  of  the  peak  in  the  correlation 
plane  to  determine  the  best  match  between  the  input  and  one 
of  the  stored  input  images.  Once  the  match  has  been  estab¬ 
lished,  only  the  corresponding  memory  is  illuminated,  which 
eliminates  the  crosstalk  and  the  distortion  present  in  the  ghost 
holograms.  The  final  possibility,  shown  in  Fig.  1(c),  is  to 
sample  the  correlation  plane  only  at  the  origin,  where  autocor¬ 
relation  peaks  occur,  using  an  array  of  pinholes  (rather  than  to 
actually  detect  the  peak).  The  spatially  sampled  correlation 
peak  rather  than  the  entire  correlation  plane  is  then  convolved 
with  the  associated  stored  output  image.  This  eliminates  the 
distortion  present  in  the  ghost  image  holograms  but  does  not 
entirely  eliminate  the  crosstalk.  Consequently,  compared  to 
the  ghost  holograms,  the  quality  of  the  recalled  images  is 
dramatically  improved  in  this  case:  however,  the  number  of 
patterns  that  can  be  stored  in  the  same  hologram  is  reduced, 
compared  with  the  peak  detection  type.  The  crosstalk  at  the 
output  of  the  system  can  be  reduced  by  thresholding  if  the 
stored  patterns  are  binary,  and  further  improvement  can  be 
realized  for  autoassociations  (i.e..  the  input  and  output  stored 
patterns  are  the  same)  through  the  use  of  feedback.'^''*  The 
advantages  of  the  latter  type  of  memory  are  its  simplicity, 
since  no  active  devices  are  needed  at  the  intermediate  level, 
and  its  robustness  with  respect  to  failure  of  components.  In 
the  peak  detection  memory,  if  the  element  that  senses  the 
correlation  peak  of  a  particular  stored  pattern  fails,  the  entire 
memory  is  erased.  Moreover,  this  type  of  memory  (also  re¬ 
ferred  to  as  outer  product  memory)  is  widely  used  in  the 
modeling  of  neural  networks,  and  it  generalizes  very  naturally 
to  multilayered  networks. 

In  this  paper  we  describe  in  detail  two  holographic  associa¬ 
tive  memories  that  utilize  the  pinhole  sampling  method.  Both 
memories  are  constructed  as  an  array  of  VanderLugt  correla¬ 
tors.  Multiple  images  are  multiplexed  on  Fourier  transform 
holograms.  A  single  hologram  is  used  in  the  first  architecture. 
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an  autoassociative  memory,  since  the  input  and  output  pat¬ 
terns  are  the  same.  The  second  architecture  utilizes  two  sep¬ 
arate  holograms  for  storing  the  input  and  output  patterns  and 
can  be  either  auto-  or  heteroassociative.  The  experimental 
demonstration  of  both  systems  operating  as  autoassociative 
memories  is  described. 


2.  ARCHITECTURES 

To  construct  a  two-dimensional  outer  product  associative 
memory,  we  need  to  implement  the  following  operation: 

h(x.y)*  jTlx.y.S.nlhC.xDd^dti .  (1) 

where 

M 

T(x.y.  4.11)  =•  5)  h„(x.y)f„(^.n)  -  (2) 

HI*  t 

T(x.y.  4.ri)  is  the  synaptic  matrix  in  accordance  with  the  outer 
product  storage  mechanism,  fm(^.'n)  is  the  mth  input  memory. 
hm(x.y)  is  the  associated  output  pattern,  and  M  is  the 
number  of  memories  stored.  is  the  input  function  to  the 
memory,  and  h(x.y)  is  its  output.  In  the  remainder  of  this 
paper  we  concern  ourselves  with  autoassociative  memories,  in 
which  h„(x,y)  *  fm(x.y).  To  implement  Eq.  (1),  we  need  a 
four-dimensional  interconnection  matrix  that  cannot  be  di¬ 
rectly  implemented  optically.  However,  if  we  substitute  Eq. 
(2)  into  ( I )  and  rearrange,  we  end  up  with  the  following 
inner  product  representation: 


htx.y)  ■ 


j  2 

mai  I 


f(4.ii)d4dTi 


i[/' 


f»(t.'n)f(t.Ti)d4dii  fm(x.y) 


(3) 


From  Eq.  (3)  we  deduce  that  (he  optical  implementation  of 
this  memory  can  be  decomposed  into  three  steps.  First,  the 
inner  product  of  the  input  and  each  memory  must  be  formed. 
This  can  be  optically  evaluated  as  the  correlation  sampled  at 
the  origin.  Second,  each  inner  product  must  be  multiplied  by 


imagM  foutitr  Hotovom 

10  b*  Trmiorm 

ttortO  Utm 


Rg.  2.  Opdeai  setup  tor  the  rscording  of  the  holegrapMc  momory. 


ng.  3.  Schomatic  diagram  of  tha  holographic  mamory  systam. 


the  associated  memory.  Finally,  these  products  must  be 
summed  over  all  of  the  memories  to  produce  the  final  result. 

The  images  are  stored  in  a  conventional  Fourier  transform 
hologram,  as  shown  in  Fig.  2.  All  of  the  memories  to  be 
stored  are  arranged  side  by  side,  spatially  separated  from  each 
other.  In  the  analysis,  it  is  assumed  that  each  memory  is 
separated  by  the  same  distance  along  the  x  and  y  directions. 
The  Fourier  spectrum  of  all  of  the  memories  interferes  with  a 
single  tilted  plane  wave  to  simultaneously  make  a  multiple 
hologram.  The  amplitude  distribution  at  the  input  plane  is 

M 

fm(t  ~  Sm*  ri  ~  bm)  >  (4) 

in«I 

where  a^  and  b^  are  the  positions  of  the  mth  image  in  the  4 
and  i|  directions,  respectively.  When  we  record  the  inter¬ 
ference  on  a  holographic  plate  at  the  hologram  plane,  the 
amplitude  transmittance  of  the  hologram  becomes 

M  , 

2  F„(u,v)expl  -j(ua„,  +  vbj)  +  exp(-ju4o) 

M 

*  2  Fmiu.vlexpijluia,,  -  Co)  +  vb„l} 

HI”  I 

conjugate  term  -*■  dc  terms  .  (3) 

where  Fm(u,v)  is  the  Fourier  transform  of  fm(x,y)  and  Co  ‘s  a 
constant  that  determines  the  angle  of  incidence  of  the  refer¬ 
ence  beam. 

The  system  used  to  recall  the  information  stored  on  the 
hologram,  shown  in  Fig.  3.  is  a  modified  VanderLugt  corre¬ 
lator.  The  input  pattern  is  placed  at  plane  P)  and  is  Fourier 
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twifai— d  by  leas  L|.  The  tnasfonn  illumiaates  (he  hok>> 
graphic  neoMMy  placed  ai  the  Fourier  plaae  Pj.  The  coricla* 
tkuis  of  the  iaput  aad  each  memory  are  produced  at  piaae  Pj 
by  leas  La.  The  iaaer  pro(^  values  are  sampled  by  aa  array 
oi  piaholes  at  Pa.  Ea^  piahole  is  positioaed  exactly  where 
ea^  of  the  stored  ioia^  was  centered  when  the  Fowier 
transfixn  hoJogram  was  recorded.  Therefore,  if  the  input  is 
one  of  the  sior^  images  cmitered  on  the  opti^  axis,  then  a 
sharp  aulococrelatioo  peak  will  form  at  Pa  on  one  of  the 
pinholes.  Light  emergiBg  from  each  pinhole  is  reflected  by  a 
mirror  placed  immediatdy  after  the  pinholes,  and  the  re* 
fleeted  light  illuminates  die  hologram  to  form  the  recon¬ 
structed  images  of  all  of  the  memories  at  the  ouQMt  plane  P4. 
The  reconstruction  due  to  light  from  each  pinhole  is  the  entire 
composite  memory  shifted  by  the  position  of  the  pinhole.  At 
the  origin  at  plane  P4  we  obtain  the  superposition  of  all  of  the 
memories.  The  strength  with  which  each  memory  is  repre¬ 
sented  in  this  superposition  is  proportional  to  the  value  of  the 
inner  product  between  the  input  and  the  corresponding  mem¬ 
ory.  A  window  is  placed  at  P4  to  select  only  the  desired 
central  portion  of  the  reconstructed  holographic  images. 

We  now  describe  the  operation  of  this  system  analytically. 
Let  f({,  11)  be  the  amplitude  transmittance  at  plane  P|  in  Fig. 
3.  Then,  the  term  of  interest  in  the  amplitude  of  the  light 
diffracted  by  the  hologram  is 

- 

2)  F(u,v)Fl(u,v)exp(jlu(s«  -  €0)  +  vb«l} .  (6) 

■•I 

At  the  correlation  output  plane  P3,  the  light  amplitude  is  the 
Fourier  transform  of  Eq.  (6): 

M 

2  *•(-*'.  -y')  •  M*'  -  s«  +  «o.y'  -  hj .  (7) 

■•I 

where  g«(x',y')  is  the  correlation  of  fm((,T))  with  f(5,ri),  and 
x'  ,y '  are  the  coordinates  in  plane  P}.  The  correlation  output  is 
sampled  by  the  pinhole  array  located  at  coordinates  x'  »  a„,  - 
y'  ~  bm  in  P3.  We  assume  that  the  pinholes  can  be 
adequately  described  mathematically  as  delta  functions. 
Then,  the  light  reflected  by  the  mirror  at  plane  P3  can  be 
written'  as 


2  -y')  •  h|x’  -  Sn.  +  €o.y'  -  bm) 

m-l  •-  J 

X  R{X'  -  a*  +  fc.y'  -  bm) 

M 

-  2  ~  <*) 

n-l 

The  reflected  light  illuminates  the  hologram  in  Pj.  and  the 
amplitude  of  the  light  traveling  from  right  to  left  in  Fig.  3 
immediately  to  the  left  of  Pj  is  given  by 


2  2  WO.0)F;,  (u,v)exp{-jIu(am  -  am  )  +  v(bm  -  bm  ))) 
-1^1  (9) 


The  light  at  the  output  plane  P4  is  the  Fourier  transform  of 


Hg.  4.  Cloaad  loop  vacalon  ofthe  hotopfaptite  muiocv  ayatam. 


Eq.  (9).  Note  that  in  the  above  equation,  unless  m  =  m',  the 
spectra  F„(u,v)  will  emerge  on  a  high  spatial  frequency  car¬ 
rier,  which  means  that  they  will  be  reconstructed  off-axis  at 
P4.  The  total  light  amplitude  at  P4  can  be  written  as 

M  M 

2  2  +  »«  -  S„  .y  +  b„  -  bm ) .  (10) 

■■“I  m’“l 

When  we  observe  the  light  only  through  a  window  that  is 
centered  around  the  optical  axis  and  is  equal  in  size  with  each 
memory,  only  the  terms  m  =  m'  survive: 

M 

2  •  (11) 

If  fm  is  real, 

M 

2  ««(®’®)^">(*'y)  •  (12) 

m>l 

Comparing  the  result  in  Eq.  ( 12)  with  Eq.  (3),  we  see  that  the 
optical  system  we  described  is  exactly  the  outer  product  asso¬ 
ciative  memory. 

If  the  input  pattern  is  most  similar  to  the  stored  image 
ilibn  the  correlation  between  the  input  and  f^,,  will 
be  the  strongest,  and  consequently  [from  Eq.  (12)].  fm„(x.y) 
will  be  amplified  the  most  in  the  final  output  reconstruction. 
However,  there  is  still  crosstalk,  since  all  of  the  other  mem¬ 
ories  are  also  weakly  read  out.  This  crosstalk  can  be  elimi¬ 
nated  if  the  stored  images  are  binary,  in  which  case  the  output 
can  be  thresholded  and  fed  back  to  the  input  for  multiple 
iterations.'  A  closed-loop  version  of  the  holographic  memory 
is  shown  in  Fig.  4.  The  light  at  the  output  is  detected  on  a 
two-dimensional  CCD.  The  video  signal  from  the  CCD  is 
electronically  thresholded  and  fed  back  to  the  input  plane  of 
the  system  through  an  electronically  addressed  spatial  light 
modulator.  The  Litton  magneto-optic  spatial  light  modula¬ 
tor'^  is  one  candidate  device  that  can  be  used  for  this  purpose. 
The  system  also  can  be  configured  with  optical  feedback 
using  an  optically  addressed  spatial  light  modulator. 

An  alternative  implementation  of  this  type  of  associative 
memory  is  shown  in  Fig.  5.  This  architecture  is  basically  an 
unfolded  version  of  the  system  in  Fig.  3.  In  other  words, 
instead  of  having  a  mirror  that  reflects  the  light  back  through 
the  same  system,  we  have  two  identical  optical  systems,  one 
after  the  other.  This  allows  us  to  use  two  separate  holograms. 
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which  provides  added  flexibiiity  in  designing  the  system  and 
also  makes  heteroassociations  possible.'*  ”  The  system 
shown  in  Fig.  5  is  an  autoassociative  memory  with  feedback 
and  thresholding.  An  input  image  enters  the  system  through 
the  beamsplitter,  as  shown  in  the  figure,  and  is  thresholded  by 
an  optically  addressed  spatial  light  modulator.  Several  de¬ 
vices  can  be  used  for  this  purpose,  such  as  the  Hamamatsu 
microchannel  spatial  light  modulator^  or  the  Hughes  liquid 
crysul  light  valve.*'  The  optical  system  from  plane  Pi  to 
plane  P3  is  a  modified  VanderLugt  correlator  similar  to  the 
one  used  previously.  The  correlation  patterns  are  sampled  by 
an  array  of  pinholes  at  Pj,  and  the  light  emerging  from  plane 
Pj  illuminates  a  second,  identical  system.  T^e  light  reaching 
plane  P|  is  the  superposition  of  all  of  the  images  that  have 
been  stored  in  the  multiplexed  holograms.  Each  image  is 
weighted  by  the  inner  product  between  the  pattern  rworded 
on  the  spatial  light  modulator  from  the  previous  iteration  and 
itself.  Thus,  the  systems  shown  in  Figs.  4  and  5  are  function¬ 
ally  identical.  As  will  be  seen  when  we  describe  the  experi¬ 
mental  demonstration  of  the  two  systems,  the  added  flexibil¬ 
ity  of  the  system  in  Fig.  5  can  significantly  improve  the 
performance. 

3.  EXPERIMENTAL  RESULTS 

The  experimental  apparatus  assembled  to  demonstrate  the 
memory  of  Fig.  3  is  shown  in  Fig.  6.  The  multiplexed  Fourier 
transform  hologram  (item  3)  was  fabricated  in  dichromated 
gelatin.  The  pinhole  array  (5)  was  made  by  drilling  four  holes 
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Fig.  3.  Correlation  outputs  for  lal  the  firit  and  (bl  the  thM  iTMinoriM 

at  tho  input 

on  a  thin  metal  plate.  The  diameter  of  each  pinhole  was  350 
(sm.  The  pinhole  array  was  then  placed  in  contact  with  a 
mirror.  A  CCD  camera  (7)  was  used  to  detect  the  output  of  the 
memory  through  the  beamsplitter  (6).  The  four  patterns  used 
as  the  memories  in  Ibis  experiment  are  shown  in  Fig.  7,  The 
patterns  obtained  at  the  correlation  plane  (or  equivalently,  the 
plane  of  the  pinholes)  when  the  first  and  third  stored  patterns 
were  presented  at  the  input  are  shown  in  Figs.  8(a)  and  8(b). 
respectively.  A  sharp  autocorrelation  peak  is  evident  in  both 
cases,  and  the  position  of  these  peaks  coincides  precisely  with 
the  position  of  two  of  the  four  pinholes.  It  is  interesting  to 
note  that  the  inclusion  of  the  pinholes  destroys  the  shift  in¬ 
variance  of  the  VanderLugt  correlator.  If  the  input  patiero  is 
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shifted  from  its  nominal  position,  then  the  correlation  peak 
shifts  as  well;  consequently,  it  misses  the  pinhole,  and  thus  no 
light  is  reflected.  If  the  pinholes  are  removed,  however,  we 
are  back  to  ghost  holography.  The  recalled  image  obtained 
when  the  pinholes  are  remov^  and  the  first  memory  is  placed 
at  the  input  plane  is  shown  in  Fig.  9.  This  image  was  obtained 
at  the  output  of  the  system  (on  the  CCD)  with  the  masking 
window  removed.  The  image  is  obviously  highly  distorted, 
and  there  is  no  apparent  favoring  of  the  correct  memory.  A 
dramatic  improvement  is  obtained  when  the  pinholes  are  in¬ 
cluded  (Fig.  10,  with  partial  inputs  also  displayed).  In  this 
case,  each  memory  is  faithfully  reconstructed.  The  shift  in¬ 
variance  of  the  system  can  be  restored  by  including  a  quad¬ 
ratic  nonlinearity  in  the  correlation  plane. 

The  experimental  apparatus  of  the  loop  system  (Fig.  S)  is 
shown  in  Fig.  1 1 .  The  system  was  constructed  with  the  micro- 
channel  spatial  light  modulator  as  the  threshold  device,  and 
the  holograms  were  recorded  on  thermoplastic  plates.  The 
four  Fourier  transform  lenses  and  the  pinhole  array  are  visible 
in  the  photograph.  Several  results  obtained  with  this  setup  are 
shown  in  Fig.  12.  The  four  faces  used  as  the  memories  are 
displayed  in  Fig.  12(a).  Figures  12(b)  and  12(c)  show  the 


ng.  11.  Experimental  aetup  of  the  loop. 


reconstruction  of  the  first  and  second  holograms,  respectively 
(aee  Fig.  S).  Note  that  the  reconstruction  of  the  first  holo- 
gram,  used  for  recognition  in  this  architecture,  is  edge  en¬ 
hanced.  This  was  accomplished  by  recording  the  hologram 
such  that  the  high  spatial  frequency  portion  of  the  spectrum 
was  enhanced,  which  ensures  that  the  cross-correlations  be¬ 
tween  the  four  faces  are  much  smaller  than  the  autocorrelation 
peaks.  The  second  hologram,  used  for  read  out  of  the  stored 
infoniMtion,  is  recorded  so  that  a  faithful  reconstruction  is 
obtained  using  diffuse  illumination  during  the  recording.  The 
partial  input  and  the  complete  recalled  image  are  shown  in 
Figs.  12(d)  and  12(e),  respectively.  The  noise  evident  in  Fig. 
12(e)  is  speckle,  a  consequence  of  the  diffuser  used  to  form 
the  second  hologram.  Comparing  Fig.  12(e)  with  Fig.  10,  we 
see  that  in  Fig.  10  there  is  still  evidence  of  crosstalk  superim¬ 
posed  on  the  reconstructed  images,  while  no  crosstalk  is  de¬ 
tectable  in  Fig.  12(e).  The  thresholding  performed  by  the 
spatial  light  modulator,  the  high  space-bandwidth  product  of 
the  images  used,  and  the  vinual  onhogonalization  of  the  four 
memories  accomplished  by  the  high  pass  filtering  in  the  first 
stage  combine  to  eliminate  the  crosstalk  in  a  single  pass 
through  the  loop  in  the  second  experiment. 
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INVARIANCE  AND  DISCRIMINATION  PROPERTIES  OF 
THE  OPTICAL  ASSOCIATIVE  LOOP 

Ken  &u  and  Demetri  Psaltia 

California  Institute  of  Technology 
Department  of  Electrical  Engineering 
Pasadena,  California  91125 

Introductioin 

In  this  paper  we  report  recent  experimental  results  from  the  optical  associative  mem¬ 
ory  that  we  have  described  previously  [1,2].  This  system  is  a  single  layer  neural  network 
architecture  simulating  a  2-D  array  of  approximately  10*  neurons  on  which  images  czui 
be  represented.  This  2-D  array  of  neurons  is  fully  interconnected  via  holograms  and  the 
system  is  organized  as  an  auto-associative  memory  with  feedback.  An  external  image  pro¬ 
jected  into  the  system  causes  one  of  the  stored  images  to  become  a  stable  state  cX  the 
system.  The  ability  of  the  system  to  recognize  distorted  versions  (e.g.  rotated,  shifted, 
or  scaled)  of  a  stored  image  depends  critically  on  the  gain  of  the  system  as  the  light  goes 
around  the  loop.  High  gain  provides  invariance  to  distortions  but  ultimately  it  also  leads 
to  a  loss  in  discrimination  against  unfamiliar  images.  Thus  there  is  an  optimum  choice  of 
parameters  of  the  system  that  yields  optimum  performance.  In  what  follows  we  describe 
how  the  parameters  affect  the  performance  of  the  memory  and  we  report  the  performance 
(in  terms  of  discrimination  vs.  invariance)  obtained  by  the  experimental  system. 

Experimental  System 

A  schematic  diagram  of  the  optical  associative  loop  is  shown  in  Fig.l  and  a  photo¬ 
graph  of  the  experimental  apparatus  is  shown  in  Fig.2.  This  processor  is  comprised  of 
two  cascaded  correlators  of  which  the  first  is  used  for  calculating  the  degree  of  similarity 
between  the  external  input  image  and  the  images  stored  in  the  hologram.  The  second  cor¬ 
relator  uses  the  output  from  the  first  correlator  to  reconstruct  the  same  images  that  are 
also  stored  in  the  second  hologram  to  provide  the  feedback  signal  for  the  loop.  The  oper¬ 
ation  of  this  associative  loop  can  be  expluned  with  the  aid  of  the  block  diagram  shown  in 
Fig.Sa.  In  this  example  four  images  spatially  separated  and  stored  in  the  Fourier  transform 
holograms  Hi  and  as  shown  in  Fig.Sb.  When  the  input  pattern  A  is  presented  as  an 
input  to  the  system,  the  first  correlator  produces  the  auto<orrelation  pattern  along  with 
three  cross-correlations  at  plane  Pj*  The  pinhole  array  at  Pj  samples  these  correlation 
patterns  at  the  middle  of  each  pattern  where  the  inner  products  between  the  input  and 
each  of  the  stored  images  form.  Each  of  the  four  beams  that  go  through  the  pinholes  goes 
through  the  second  correlator  to  reconstruct  the  four  images  stored  in  hologram  Hj.  These 
reconstructed  images  are  spatially  separated  and  superimposed  at  plane  Pi.  The  stored 
image  which  is  most  similar  to  the  input  pattern  gives  the  strongest  correlation  signal 
hence  the  brightest  reconstructed  image.  The  weakly  read-out  from  the  cross-correlation 
can  be  eliminated  by  thresholding  by  the  LCLV.  The  output  of  the  LCLV  becomes  the  new 
input  image  for  the  loop  and  thus  iterations  take  place.  The  stable  pattern  that  forms  as 
a  recirculating  image  in  the  loop  is  the  stored  image  that  is  most  similar  to  the  original 
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input. 

In  the  eystem  of  Fig.l  the  input  pattern  ie  imaged  onto  the  LCLV  by  lens  Li  and 
through  beam  splitter  BSz.  A  collimated  argon  laser  beam  illuminates  the  read-out  side 
of  the  LCLV  through  beam  splitters  BSz  and  BSi.  A  portion  oi  the  reflected  light  from 
the  LCLV  that  propagates  straight  through  BSu  i*  diverted  by  BSz^  and  it  is  imaged  by 
lens  Lo  onto  a  CCD  television  camera.  This  provides  real  time  monitoring  ct  the  activity 
of  the  system.  The  portion  of  the  light  that  is  reflected  by  BS\  into  the  loop  is  Fourier 
transformed  by  lens  Li  and  illuminates  the  hologram  H\.  The  correlation  between  the 
input  image  and  each  of  the  stored  images  u  displayed  at  plane  Pj.  The  pinhole  array  at 
Pi  has  center  spacings  corresponding  to  the  spatial  separations  of  the  stored  images.  The 
remainder  of  the  optical  system  from  Pi  back  to  the  neural  plane  P\  is  essentially  a  replica 
of  the  first  half,  with  the  hologram  Hi  storing  the  same  set  of  images  at  H\.  Fig  4  shows 
an  example  of  an  experiment  performed  with  this  loop.  Fig.  4a  is  the  external  input, 
in  this  case  a  partially  obstructed  image  of  one  of  the  stored  patterns.  Fig.4b  shows  the 
response  the  system  with  the  external  input  still  present,  and  Fig.4c  shows  the  stable 
state  of  the  loop  after  the  external  input  is  removed. 

One  cS  the  interesting  properties  of  this  system  is  its  dynamics.  The  time  for  the 
loop  to  reach  a  stable  state  depends  very  much  on  the  initial  conditions.  Fig.Sa  shows 
the  temporal  response  of  the  loop  to  an  input  pattern.  When  the  signal  in  the  lower 
trace  becomes  high,  it  indicates  that  the  external  input  is  ON.  The  upper  trace  shows  the 
corresponding  response  of  the  loop.  The  initial  rise  is  due  to  the  presentation  of  the  input 
whereas  the  second  rise  is  due  to  the  fact  that  the  feedback  path  was  closed.  It  is  seen 
that  it  takes  about  two  seconds  for  the  loop  to  reach  a  stable  state  whereas  the  rise  time 
of  the  LCLV  is  approximately  one  second  in  the  mode  we  operated  it.  When  the  external 
input  is  turned  off,  the  loop  remains  latched  to  a  stable  state  which  is  one  of  the  stored 
images.  Fig.Sb  shows  the  same  experiment  but  with  input  intensity  reduced  to  one  third  of 
the  first  input.  The  second  rise  of  the  upper  trace  shows  that  it  takes  apprmcimately  four 
seconds  for  the  loop  to  reach  its  stable  state.  After  the  input  is  turned  off  the  loop  gives 
the  same  output  intensity.  This  example  shows  that  initial  conditions  affect  the  dynamics 
of  the  loop  but  it  does  not  affect  its  final  state.  We  will  see  in  the  next  section  similar 
invariances  when  the  input  is  shifted  and  rotated. 

The  loop  dynamics  and  related  invariance  properties  can  be  best  understood  by  using 
a  network  model  as  shown  in  Fig.6.  Each  resolution  element  of  the  LCLV  simulates  a 
separate  neuron  and  with  resolution  of  the  device  used  being  approximately  400  x  400 
pixels,  160,000  neurons  are  simulated.  Each  of  these  neurons  is  globally  connected  and 
fed  back  to  everyone  via  the  two  holograms.  The  optical  signal  is  attenuated  in  the  loop 
due  to  the  diffraction  efficiencies  of  the  Fourier  transform  holograms  and  the  losses  from 
pinholes  as  well  as  lenses  and  beam  splitters.  Therefore  neurons  have  to  provide  optical 
gain  to  compensate  this  loss.  In  our  system  this  is  achieved  by  adding  an  image  intensifier 
at  the  photoconductor  side  of  the  LCLV.  The  microch2Uinel  plate  of  the  image  intensifier  is 
sensitive  to  a  minimum  incident  intensity  of  approximately  1  nW/cm*  and  it  reproduces 
the  input  with  an  intensity  10^  times  brighter  (10  j err?').  This  is  bright  enough  to  drive 
the  LCLV.  If  we  use  a  beam  with  intensity  equal  to  10  rrW (err?  to  read  the  LCLV  then 
the  intensity  of  the  output  light  is  approximately  1  mW (cm? .  Thus,  the  combination  of  the 
image  intensifier  and  the  LCLV  provide  optical  gain  up  to  10*.  Fig.Ta  shows  the  input- 


output  clwractoriitics  of  the  opticul  thraholdinf  element  which  ie  eimilar  to  a  eigmoid 
function.  The  optical  gain  can  be  a4}ueted  by  changing  the  biae  voltage  of  the  image 
intennfier.  Fig.7b  ehowe  the  relationahip  between  the  bias  voltage  appli^  to  the  image 
inten^er  and  the  gidn. 

The  dynamicc  of  the  recall  process  can  be  described  by  using  an  iteration  map  formed 
tqr  the  gain  and  loss  curves  as  shown  in  Fig.8.  In  the  figure  the  slope  of  the  straight  line 
is  proportional  to  loop  loss  due  to  the  holograms  and  the  pinholes  and  it  is  superimposed 
with  ^e  input-output  response  of  the  neurons.  The  intersection  points  of  this  line  with  the 
neuron  gain  curve  at  point  Qi  determines  the  threshold  level  and  Q2  represents  a  stable 
point.  If  the  initial  condition  of  the  neuron  is  above  the  threshold  point  dl,  the  signal 
grows  in  each  iteration  until  it  arrives  and  latches  at  Qi.  On  the  other  hand,  if  the  initial 
condition  is  below  dl  the  signal  will  decay  to  sero.  The  number  of  iterations  depends  on 
the  distance  of  the  initial  condition  from  the  threshold.  This  explains  the  dynamics  of 
Fig.5.  Similarly,  if  the  loop  loss  is  low  or  the  neuron  gain  is  high  one  can  expect  that  the 
loop  will  converge  faster  to  a  stable  state.  Rsusing  the  gain  also  has  the  effect  of  lowering 
the  threshold  of  the  system.  In  the  following  section  we  will  see  that  the  setting  of  the 
gain  is  the  key  parameter  that  mediates  the  trade-off  between  distortion  invariance  and 
discrimination  capabiUty  of  the  loop. 

Invariance  versus  Discrimination  TVade-offs 

In  the  previous  section  we  saw  that  as  long  as  the  gain  is  high  enough  and  the  external 
input  is  strong  enough  to  produce  an  initial  condition  for  the  LCLV  that  is  above  threshold, 
then  the  loop  will  converge  to  one  of  the  stored  stable  states.  Since  the  external  input 
does  not  affect  the  shape  of  the  final  state,  but  rather  it  selects  which  state  is  produced 
we  can  build  a  degree  of  invariance  in  the  system  since  a  shifted,  rotated  or  scaled  version 
of  a  stored  image  can  cause  the  stored  image  to  be  recalled.  The  effect  of  such  distortions 
of  the  input  image  are  to  decrease  the  level  of  the  initial  condition.  Therefore,  by  raising 
the  neuron  gain,  no  matter  how  much  we  change  the  initial  condition  by  rotating,  shifting, 
and  scaling  the  input  image,  the  loop  can  always  be  made  to  produce  an  image  as  a  stable 
state.  But  the  ability  to  correctly  recognize  a  stored  image  from  a  distorted  input  and  the 
discrinunation  capability,  i.e.  the  ability  to  distinguish  images  from  one  another  are  two 
things  that  compete  with  each  other.  If  there  is  too  much  gain  then  just  shining  a  flush 
light  at  the  input  of  the  system  causes  it  to  lock  on  to  one  of  its  stable  states.  If  the  gain 
is  set  too  low  then  even  an  input  that  is  a  slightly  dbtorted  version  of  one  of  the  stored 
images  is  not  recognized.  There  are  two  parameters  under  our  control  that  can  affect  the 
gain  in  the  loop:  The  gain  of  the  neurons  and  the  size  of  the  pinhole. 

We  will  use  Fig.Sb  as  an  example.  Let  /{(x,y),t  =  1,2, 3, 4,  represent  the  images  of 
the  letters  A,  B,  C,  D,  respectively  and  let  the  pinhole  size  be  W.  Then  the  reconstructed 
images  in  the  window  at  JPi  can  be  shown  to  be 

«'al 

where  *  represents  the  convolution  operation,  yii(x,y)  is  the  auto-correlation  of  A  and 
li  sfs  the  cross-correlations  of  A  with  B,  C,  D,  respsctively.  We  see  that  the 
images  are  blurred  by  the  finite  dimension  of  the  pinholes.  Decreasing  W  gives  better 
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image  quaUty  but  we  need  to  increaae  the  gain  of  the  neurons  to  compensate  for  the  loss 
due  to  the  small  pinholes.  At  the  other  limit,  if  the  pinhole  sise  is  increased  we  do  not  need 
high  gain  neurons  but  the  image  quality  deteriorates.  In  the  limit  when  W  becomes 
infinitely  large,  the  reconstructed  image  in  Uie  window  at  Pi  becomes  a  superposition  of 
all  the  stored  images,  each  equally  strong,  and  severely  blurred.  Thus  there  is  an  optimum 
pinhole  sise  and  an  optimum  neuron  gain.  Fig.9  shows  the  minimum  gain  required  and 
maximum  gain  allowable  for  the  loop  to  sustain  a  stable  memory  as  a  function  of  pinhole 
sise.  Below  the  minimum  gain  the  loop  can  not  recognize  any  image  in  the  sense  that 
once  the  external  input  is  cut  off  the  loop  activity  decays  to  zero.  Above  the  maximum 
g^  the  loop  loses  discrimination  capability  meaning  that  any  input  image  even  a  flash 
light  will  induce  the  loop  into  a  stable  state.  This  behavior  is  consistent  with  our  previous 
predictions.  Note  that  the  minimum  giun  increases  when  the  pinhole  size  is  increased  to 
more  than  250  ftm.  This  is  because  the  reconstructed  images  are  blurred  so  much  that 
the  correlation  peaks  are  weakened  and  the  losses  in  the  loop  are  increased.  Fig.9  shows 
that  the  optimum  pinhole  size  in  this  system  is  in  the  range  of  70  ftm  to  150  fitn.  We 
choose  90  for  the  rest  of  the  experiments. 

Two  kinds  of  invariances  are  studied;  shift  and  rotation.  The  images  stored  in  the 
holograms  were  four  faces.  The  invariance  capability  was  measured  by  presenting  to  the 
network  one  of  the  stored  images  rotated  and/or  shifted  by  varying  amounts  and  monitor¬ 
ing  the  response  of  the  system  under  various  gain  conditions.  From  Fig.9,  the  minimum 
gain  for  this  pinhole  used  is  2.8  x  10^  and  the  maximum  gain  is  1.2  x  10*.  We  made 
measurements  under  low  gain  (=3  x  10*)  and  high  gain  (=:10*)  conditions.  The  results 
of  the  shift  experiment  are  shown  in  Fig.lO.  Fig.  10a  shows  that  as  the  input  image  is 
shifted  away  from  the  memory  position,  the  loop  response  time  becomes  longer  because 
the  correlation  signal  is  shifted  away  from  the  pinhole.  This  makes  the  initial  condition  of 
the  loop  weaker  thus  it  takes  more  iterations  to  reach  a  stable  state.  If  the  input  is  shifted 
too  much  then  the  correlation  peak  misses  the  pinholes  completely  thus  the  input  is  not 
recognizable.  However,  the  output  intensity  is  shift  invariant  as  long  as  the  loop  recognizes 
the  input.  Fig.lOb  shows  that  the  tolerance  to  shift  can  be  increased  by  increasing  the 
neuron  gain.  But  in  this  high  gain  re^on  the  loop  has  poor  discrimination  capability  and 
it  also  incorrectly  recognizes  a  similar  face  as  one  of  the  stored  images. 

The  dynamics  and  invariance  properties  under  rotation  of  the  input  were  also  mear 
sured  by  using  the  same  pinhole  diameter  and  optical  gain.  The  results  are  shown  in 
Fig.  11.  It  b  seen  that  by  increasing  the  optical  gain  from  10*  to  10*  the  allowable  rotation 
angle  for  the  input  b  increased  from  8  degrees  to  16  degrees.  Again  the  dynamics  and 
rotation  invariance  are  consbtent  with  our  predictions. 
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FlgttK  3.  Operation  principle  of  the  optical  aaaociatiye  loop,  (a)  Block  diagram  of  the 
optical  loop,  (b)  Example  of  recalling  one  of  the  atored  images  from  an  associative  input. 
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Figure  4.  Example  of  attto>associatlve  reealling.  (a)  Half-face  image  input  into  the  loop, 
(b)  Overlap  of  the  input  image  with  the  recalled  complete  image,  (c)  Stable  image  in  the 
loop  after  the  input  was  cut  off. 
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Figure  5.  Temporal  response  of  the  optical  bop.  (a)  Loop  response  with  a  strong  initial 
conditbn.  (b)  Loop  response  with  a  weak  initial  condition. 
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Figun  6.  N«unl  network  model  of  the  opticel  vito-aeiocintive  loop. 
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Figure  8.  Dynamics  of  the  optical  loop, 
threshold  ealue  of  the  loop,  /i,  h  *  initial 
conditions  the  loop. 
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Figars  0.  The  allowable  gain  region  of  the 
optical  loop  eersus  pinhole  sise. 
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Figure  10.  The  shift  imrariance  of  the  optical  associative  loop,  (a)  Neuron  gain  *  3  x  10^. 
(b)  Neuron  gain  *  I  x  10*.  The  upper  curve  is  the  loop  output  intensity,  and  the  lower 
curve  is  the  loop  rise  time. 
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Figure  11.  The  rotation  invariance  of  the  optical  asspeiativ.  >>p.  (a)  Neuron  gain  s 
3  X  10^.  (b)  Neuron  gain  «  1  x  10^.  The  upper  curve  is  th;:  loop  output  intensity,  and  the 
lower  curve  is  the  loop  rise  time.  >1-402 
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Bias-free  ttme-integratiiig  optical  correlator  using  a 
photorefractive  crystal 

Demeiri  Psaltis,  Jefiray  Yu,  and  John  Hong 


An  aoouetooptie  time-intagntuig  correlator  ia  demonstrated  using  a  photorefractive  crystal  as  the  time- 
integrating  detector. 


I.  luietlucMon 

Time  integration^  has  proved  to  be  a  powerful  tech¬ 
nique  in  optical  signal  processing  and  hu  been  used  in 
a  wide  variety  of  architectures.  A  major  drawback  of 
time-integrating  processors  is  the  buildup  of  bias  in 
addition  to  the  siipiaL  This  occurs  because  the  photo¬ 
generated  charge  that  is  integrated  on  the  detector  is 
proportional  to  the  intensity  of  the  optical  signal  which 
malm  it  necessary  to  represent  bipolar  signals  on  a 
bias.  The  effective  system  dynamic  range  at  the  out¬ 
put  is  given  by  DR'  *  OR  [SBR/(1  +  SBR)]  where  DR 
is  the  dynamic  range  of  the  output  detector  and  SBR  is 
the  signal-to-bias  ratio  on  the  detector.^  In  most  cases 
of  interest,  the  SBR  is  much  smaller  than  unity  and 
thus  the  added  bias  significantly  reduces  the  usable 
dynamic  range  of  the  system. 

The  most  frequently  used  method  for  separating  the 
signal  from  the  bias  involves  placing  the  signal  on  a 
spatial  carrier  and  then  electronically  filtering  the  out¬ 
put  of  the  integrator.  This  method  of  bias  removal, 
however,  does  not  solve  the  dynamic  range  problem 
since  the  proceMing  is  done  after  the  detection  of  the 
signal.  Also,  an  additional  constraint  is  placed  on  the 
resolution  of  the  detector,  since  the  pixel  separation 
must  be  leas  than  one-half  of  the  period  of  the  carrier 
being  recorded,  which  will  result  in  a  significant  reduc¬ 
tion  in  the  available  space-bandwidth  product  at  the 
output. 

In  this  paper  a  new  method  for  performing  time- 
integrating  correlation  is  described  using  a  photore¬ 
fractive  bismuth  silicon  oxide  (BSO)  crysttd  as  the 
time-integrating  element  The  correlation  is  formed 
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on  a  spatial  carrier  in  the  crystal  and  read  out  with  an 
auxiliary  beam.  Since  only  the  signal  recorded  on  a 
spatial  carrier  is  stored  in  the  photorefractive  crystal, 
the  diffracted  light  that  is  detected  contains  tbe  corre¬ 
lation  information  without  tiie  bias.  T%e  bias  does  not 
reduce  the  dynamic  range  of  the  output  detector  used 
for  final  readout,  but  rather  the  diffraction  efficiency 
of  the  BSO  crystal.  In  addition,  the  resolution  of  the 
BSO  crystal  is  very  much  higher  than  that  of  a  CCD 
detector,  allowing  the  correlation  of  very  high  space- 
-bandwidthsign^  to  be  formed  on  a  carrier.  Finally, 
since  the  result  of  the  time-integrating  correlator  is 
read  out  optically,  the  output  can  be  easily  interfaced 
with  other  critical  systems,  thus  making  new  architec¬ 
tural  designs  possible. 

In  Sec.  n,  the  theory  of  optical  recording  in  photore¬ 
fractive  crystal  is  reviewed  and  extended  to  the  use  of 
photorefractive  crystals  as  time-integrating  elements. 
The  architecture  and  experimental  results  are  de¬ 
scribed  in  Sec.  iH.  Dynamic  range,  linearity,  system 
limitations,  and  other  performance  aspects  are  dis¬ 
cussed  in  Sec.  IV. 

1.  PtwtorafracUve  Crystals  as  TTme  Integrating  Optical 
Dstsctors 

When  a  photorefractive  BSO  crystal  is  illuminated 
by  an  intensity  grating,  electrons  are  excited  from 
traps  into  the  conduction  band.  Hiese  charges  mi¬ 
grate  due  to  diffusion  and  drift  from  an  externally 
applied  electric  field  and  then  recombine  in  dark  re¬ 
gions,  creating  a  spatially  varying  internal  space- 
-charge  field.  This  field  modifies  the  index  of  refrac¬ 
tion  in  the  crystal  through  the  linear  electrooptic  effect 
ud,  as  a  result,  a  holographic  phase  grating  is  recorded 
in  the  crystaL  Grating  formation  in  photorefractive 
media  has  been  extensively  studied  and  modeled.^’'* 
We  will  show  here  that  the  photorefractive  crystal  acts 
as  a  time-integrating  element. 

Let  the  intensity  incident  on  the  crystal  be  as  fol¬ 
lows: 
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r/o  +  Ra(/((x4)  exp(iAx)l  for  (>  0 
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AMuming  that  self-di^action  effects  are  negligible 
and  that  the  spatial  variations  of  are  small  com¬ 
pared  to  the  grating  frequency  k,  ^e  intensity  of  the 
light  that  is  diffract^  when  the  crystal  is  illuminated 
by  a  readout  beam  can  be  shown  to  be^ 
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I/t  is  the  intensity  of  the  readout  beam  and  ifi  is  a 
complex  constant  involving  the  material  parameters  of 
the  crystal,  the  grating  frequency,  and  the  applied 
electric  Held,  r  is  the  complex  time  constant  of  the 
space-charge  field  and  is  given  by^  r  «  K2/I0.  K2  is 
also  a  complex  constant  that  depends  on  the  photore- 
fractive  material  used  and  the  experimental  condi¬ 
tions.  lo  is  the  average  light  intensity  incident  on  the 
crystal  during  exposure. 

If is  expanded  into  its  temporal  Fourier  com¬ 
ponents. 
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then  the  output  intensity  for  t  »  r  can  be  written  as 
follows: 
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The  above  is  recognized  to  be  a  low  pass  filter  with 
cutoff  frequency  l7|  r  | ,  which  is  approximately  equiv¬ 
alent  to  the  output  of  a  sliding  window  integrator,  with 
integration  time  r.  Thus, 
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Hence,  the  output  intensity  can  be  treated  as  the 
square  of  the  normalized  integration  of  the  signal  I\. 
An  interesting  observation  is  that,  if  were  inde¬ 
pendent  of  time,  the  output  intensity  depends  only  on 
the  ratio  of  the  modulat^  intensity  /i  to  the  dc  inten¬ 
sity  /o. 


Expeffenentai  Oamonstoallon 

A  schenutic  diagram  of  the  experimental  system  is 
shown  in  Fig.  1.  The  input  electrical  signals  are  mixed 
with  the  center  frequency  of  the  acoustooptic  devices 
(AODs)  and  fed  into  the  AODs.  The  ^t  AOD  is 
illuminated  by  a  collimated  wave  and  the  upshifted 
diffracted  order  is  imaged  onto  the  second  AOD,  then 
reimaged  onto  the  photorefractive  BSO  crystal  The 
second  AOD  is  oriented  such  that  the  acoustic  signal  is 
counterpropagating  with  respect  to  the  image  of  the 
acoustic  signal  from  the  first  AOD.  The  undiffracted 
light  transmitted  through  the  first  AOD  is  incident  at 
the  Bragg  angle  of  the  second  AOD.  The  upshifted 
di^act^  order  of  the  second  AOD  is  also  imaged  onto 
the  BSO  crystal  The  undiffracted  light  is  spatially 
filtered  before  reaching  the  BSO  crys^  In  this  ar¬ 
rangement,  the  AODs  are  parallel  to  each  other,  but 
the  diffracted  orders  propagate  at  an  angle  with  re¬ 
spect  to  each  other  even  though  both  diffracted  beams 
are  temporally  upshifted.  This  causes  the  signals 
from  the  two  AO^  to  interferometrically  record  the 
correlation  signal  on  the  BSO  oystal  at  a  high  spatial 
frequency.  Let  the  inputs  to  the  AODs  be  ui(t)  *  a(t) 
exp(iwot)  and  U2(t)  *  b(t)  exp(iiuot),  where  wo/2x  is  the 
center  frequency  of  the  AOD.  The  intensity  incident 
on  the  photoref^ctive  crystal  is 

I(x,t)  » |a(t  -  x/u)  txviiyx)  +  bit  +  x/u)  exp(-/TX)|’ 

-|oU-x/u)|*  +  |6(t  +  x/u)|» 

+  2  Re  [o(t  -  x/u)b*(t  +  x/u)  exp(i2>x)],  (5) 

where  v  is  the  acoustic  velocity  of  the  AOD  and  y  * 
Wo/v.  We  will  treat  the  case  where  |a(t)P  and  |6(t)P 
can  both  be  approximated  as  constants,  as  is  the  case 
for  FM  signals.  Then,  the  intensity  pattern  can  be 
separated  into  a  dc  term  Jo  and  a  signal  term  Ii(x,t) 
modulating  a  spatial  carrier  cos(2yx)  in  the  form  of  Eq. 
(1).  This  intensity  pattern  results  in  the  formation  of 
a  hologram  on  the  photorefractive  crystal  as  described 
in  the  previous  section.  The  hologram  is  read  out  with 
an  au::^ary  beam  and  is  imaged  onto  a  charge  coupled 
device  (CCD)  detector  for  readout. 

If  the  assumption  is  valid  that  /i(x,t)  has  spatial 
frequencies  which  are  small  compared  with  the  carrier 
frequency  2y,  we  can  use  Eq.  (4)  to  obtain  an  expres¬ 
sion  for  the  output  intensity  detected  by  the  CCD: 
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Fig.  2.  Ou^ut  of  a  standard  tima-integrating  correlator  without 
noiae. 


Fig.  3.  Output  of  the  biaa  removal  correlator  without  noiae. 
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and  by  defining  variable  ti  «  f  —  x/v. 


oUi)6*(t,  +  2x/o)dt| 


(6) 


Hence  the  system  produces  the  magnitude  square  of 
the  correlation  between  the  signals  a(t)  and  bit)  inte¬ 
grated  over  a  finite  interval  r. 

Flint  glass  acoustooptic  ceils  driven  at  a  center  fre¬ 
quency  of  70  MHz  were  used  in  the  experiment.  A 
symmetric  linear  chirp  signal  with  bandwidth  A/  *  5 
Mhz  was  fed  into  each  cell  to  produce  the  autocorrela¬ 
tion  peak.  The  Bragg  angle  of  the  AODs  was  0.2“, 
which  corresponded  to  a  grating  frequency  equal  to  35 
lines/mm  in  the  BSO  crystaL 

The  BSO  crystal  used  in  the  experiment  was  cut  in 
the  (110)  direction  and  measured  15  X 15  X  2  mm.  An 
external  electric  field  of  7  kV/cm  was  applied  in  the 
(001 )  direction  of  the  crystal  which  was  also  the  direc¬ 
tion  of  the  grating  vector. 

The  correlation  was  recorded  on  the  crsratal  with  an 
argon  laser  at  a  wavelength  of  514  nm  with  average 
intensity  equal  to  IpW/cm^.  The  correlation  was  read 
out  with  a  He-Ne  laser  (X  «  633  nm)  with  150-pW/cm^ 
intensity.  Cylindrical  lenses  (not  shown  in  Fig.  1) 
were  us^  to  expand  the  output  of  the  AODs  thereby 
illuminating  the  full  aperture  of  the  BSO  crystal  and 
also  to  focus  the  diffracted  light  onto  a  1-D  CCD. 

The  output  signal-to-bias  ratio  of  a  conventional 
time-integrating  correlator  is  reduced  when  the  levels 
of  the  two  signals  are  unequal  and/or  if  there  is  addi¬ 
tive  noise  present  in  the  system.  Both  conditions  were 
simulated  experimentally.  Noise  was  simulated  by 
adding  a  70-MHz  signal  to  the  input  of  one  of  the 
AODs.  The  output  of  a  standard  time-integrating 
correlator  (i.e.,  the  correlation  formed  directly  on  the 
CCD)  for  the  noise-free  case  and  equal  amplitude  sig¬ 
nals  is  shown  in  Fig.  2.  This  condition  provides  the 
maximum  signal-to-bias  ratio  for  the  system.  We  can 


Fig.  4.  Output  of  the  bia*  removal  correlator  with  a  lignal-to-noiM  Fig.  5.  Outputof  the  biaa  removal  correlator  with  a  aignal-to-noiae 

ratioofOdB.  ratio  of -10  dB. 
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■M  in  Fig.  2  that  there  is  still  a  strong  bias  term  added 
to  the  OKtelation  peak.  The  correlation  produced  by 
temporally  integrating  on  the  photorefractive  crystal 
is  sbosm  in  Fig.  3.  In  this  case,  all  the  bias  due  to 
temporal  integration  is  removed,  and  any  residual  bias 
is  due  entirely  to  dark  current  from  the  CCD.  The 
ou^uts  of  the  bias  removal  correlator  with  input  sig- 
nal-to-noiae  ratios  of  0  and —10  dB  are  shown  in  Figs.  4 
and  5,  respectively.  Again,  bias  levels  which  appear  in 
the  fignres  were  entirely  due  to  the  integration  of  dark 
current  in  the  output  detector.  In  practice,  the  detec¬ 
tor  dark  current  can  be  minimised  by  increasing  the 
intensity  of  the  readout  beam,  thereby  decreasing  the 
required  int^ation  time  of  t^  output  CCD  detector 
and/or  cooling  the  detector. 


nnoniiviGv 

The  experimental  results  described  in  the  previous 
section  show  a  dramatic  qualitative  improvement  in 
the  correlation  that  is  obtained  when  the  photorefrac- 
tive  crystal  is  used  instead  of  the  CCD.  In  this  section 
we  examine  certain  diaracteristics  of  this  method 
which  are  usefiil  for  quantitatively  evaluating  its  per¬ 
formance.  Specifically,  we  examine  the  linearity,  in¬ 
tegration  time,  dynamic  range,  and  sensitivity  of  the 
correlator. 


A.  Lineartty 

In  a  conventional  time-integrating  correlator  (co¬ 
herent  or  incoherent),  the  output  correlation  is  basi¬ 
cally  proportional  to  Uie  sigmils  applied  to  the  AODs. 
Nonlhiearitiea  occur  only  when  we  exceed  the  linear 
dynamic  range  of  the  devices  used,  i.e.,  if  the  diffrac¬ 
tion  efficiency  of  the  AOD  exceeds  several  percent  or 
the  integrating  detector  is  driven  to  saturation.  In  the 
photorefractive  time-integrating  processor,  the  output 
intensity  is  a  nonlinear,  monotonically  increasing 
function  of  the  input  voltage.  The  nonlinearity  arises 
because  of  the  square-law  detection  at  the  find  read¬ 
out  stage  and  the  recording  mechanism  in  the  photore¬ 
fractive  crystaL  The  nonlinear  relationship  is  now 
studied  andytically  and  experimental  verification  of 
the  theoretical  results  is  presented. 

Let  oilt) «  8(t)  be  a  fix^  reference  signal  and  V2it) » 
os(t)  be  an  input  signal  of  varying  amplitude  (0  <  a  < 
1).  Since  the  correlation  term  contains  spatial  fre¬ 
quencies  which  are  much  lower  than  the  grating  fre¬ 
quency,  near  the  correlation  peak  (x  >■  0)  the  intensity 
incident  on  the  photorefractive  crystal  is 

/(x,t)  -  (1  +  a*  +  2a  cotfcx)|«(t)|*. 

Using  Eq.  (1),  the  output  intensity  at  the  CCD  is  pro¬ 
portional  to 


The  modulation  depth  of  the  intensity  incident  on  the 
BSO  crystal  is 


and  hence 


Fig.  6.  Normalized  output  intensity  vs  modulation  depth. 


Fig.  7.  Theoretical  plot  of  output  intensity  vs  input  voltage  ratio. 


m*  -  4oV(l  +  o*).  (7) 

Figure  6  is  a  graph  of  the  output  intensity  at  the 
correlation  peak  vs  the  modulation  depth  incident  on 
the  crystaL  The  experimental  result  is  in  excellent 
agreement  with  file  square-law  relationship  predicted 
byEq.(7). 

A  plot  of  the  output  intensity  as  a  function  of  the 
ampUtude  of  the  input  signal  a  is  shown  in  Fig.  7.  The 
nonlinear  relationship  between  the  input  and  output 
signals  is  generally  a  disadvantage  since  the  scaling  of 
signals  of  varying  amplitudes  will  be  nonlinear.  This, 
however,  will  not  cause  a  problem  if  the  correlator  is 
used  only  as  a  signal  detection  device,  since  correlation 
peaks  will  still  be  discernible  and  only  the  threshold 
level  need  be  adjusted  accordingly  to  maximize  the 
probability  of  detection. 

B.  Integration  Time 

In  a  conventional  time-integrating  correlator,  the 
integration  time  is  limited  by  the  dark  current  buildup 
on  the  output  detector,  typically  up  to  several  hundred 
milliseconds.  When  the  photorefractive  crystal  is 
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Fig.  8.  Output  intensity  at  correlation  peak  vs  time  at  a  function  of 
different  average  incident  intensities. 


used,  the  integration  time  is  determined  by  the  rise 
time  of  the  internal  space-charge  field  which  can  easily 
be  made  much  longer.  The  correlation  can  be  read  out 
at  any  rate  that  is  convenient  by  an  auxiliary  detector 
array. 

The  integration  time  is  approximately  equal  to  H, 
where 


Hence,  the  integration  time  of  the  bias  removal  cor¬ 
relator  can  be  controlled  by  varying  the  writing  inten¬ 
sity.  This  control  is  important  since  the  integration 
time  can  be  matched  to  the  length  of  the  reference 
signal  thereby  increasing  the  probability  of  detection 
of  a  weak  signal 

The  time  response  of  the  correlation  peak  for  differ¬ 
ent  values  of  average  incident  intensity  is  shown  in  Fig. 
8.  Figure  9  is  a  plot  of  intensity  vs  the  inverse  of  the 
experimentally  observed  rise  time.  There  is  excellent 
agreement  between  the  experiment  and  Eq.  (8). 

The  integration  time,  however,  has  a  finite  range 
over  which  it  can  be  adjusted.  The  maximum  integra¬ 
tion  time  is  limited  by  the  thermal  effects  in  the  crys¬ 
tal  If  the  rate  at  wUch  carriers  are  generated  ther¬ 
mally  becomes  comparable  with  the  rate  at  which  they 
are  photogenerated,  the  modulation  depth  of  trap  den¬ 
sity  will  be  reduced.  As  a  result,  the  diffraction  effi¬ 
ciency  of  the  grating  will  decrease.  In  practice,  the 
minimum  integration  time  is  limited  by  the  maximum 
light  intensity  that  is  available  for  recording.  The 
integration  time  can  be  reduced  to  30  msec  if  the 
incident  intensity  is  made  equal  to  18  mW/cm^.  This 
power  level  however,  is  simply  not  practical  for  most 
applications. 

C.  Dynamic  Range  and  Sensitivity 

Since  the  output  of  the  bias  removal  correlator  is 
presented  without  bias,  the  output  dynamic  range  of 
the  system  is  essentially  equal  to  the  dynamic  range  of 
the  readout  detector  array.  To  characterize  the  per- 


i  t 


Fig.  9.  Inverae  of  the  rise  time  vs  average  incident  intensity. 


formance  of  the  system  we  need  to  determine  how  the 
input  signal  levels  are  mapped  to  this  output  dynamic 
range.  Let  the  dynamic  range  of  the  photorefractive 
crystal  be  defined  as  DRbso  *  where 

is  the  maximum  modulation  depth  (m^.,  ■  1),  and 
mmi.  is  the  minimum  modulation  depth  for  which  a 
dif&acted  signal  is  detectable  above  the  output  scatter 
and  noise  level  of  the  system. 

Given  two  input  signals  ui(f ) «  as(t)  and  V2(t) »  s(t), 
the  modulation  depth  of  the  light  incident  on  the  crys¬ 
tal  is  m  «  2a/{a^  + 1).  Thus,  ^  minimum  detectable 
input  signal  is  given  by  amin  *  m^iJ2  *  1/DRbso-  The 
useful  range  over  which  a  can  vary  is  limited  by  DRbso- 
From  ami,.,  one  can  define  an  input  dynamic  range 
given  by  DRmput  "  l/o^min  “  4/m^mi„.  The  most  im¬ 
portant  parameter  in  determining  the  system  dynamic 
range  is  the  minimum  detectable  modulation  depth 
mmi...  Experimentally,  we  measured  the  dynamic 
range  to  be  equal  to  23  dB.  This  corresponds  to  a 
minimum  modulation  depth  of  0. 142.  We  expect  that 
through  careful  design  tiiis  can  be  substantially  im¬ 
proved.  However,  all  the  mechanisms  that  determine 
mrnin  are  not  fully  understood.  It  is  believed  that 
besides  detector  noise  and  scattering  from  the  crystal, 
the  modulation  depth  is  limited  by  thermal  effects  in 
the  material  and  shot  noise  arising  from  the  internal 
currents. 

Another  important  aspect  of  the  correlator  system  is 
its  sensitivity  or  the  minimiun  signal-to-noise  ratio 
that  is  detectable.  This  parameter  is  also  determined 
by  the  minimum  detectable  modulation  depth,  mm.-... 
Given  a  reference  signal  Ui(t)  ■  as(t)  and  an  input 
signal  contaminated  by  additive  noise,  v^it)  »  ba(t)  + 
n(t),  the  modulation  depth  of  the  intensity  incident  on 
the  crystal  is 

_ 2ail5(£)ji_^ 

The  reference  level  which  maximizes  m  is  given  by  a  * 
(6^  +  (7„V|s(f)|2)‘''*,  corresponding  to  a  modulation 
depth  of 
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_ _ ^  _ 

In  practice,  optimizing  the  refer,  nee  level  can  easily  be 
acUeved  by  setting  the  power  ot  t  he  reference  equal  to 
tile  total  average  power  of  the  iitput  signal 

Normalizing  the  signal  and  noise  terms  such  that 
|s(t)|*  ■»„*«!  we  obtain 

b 

m  ■  — - - -  • 

(6*+!)^'- 

Thus,  the  minimum  input  SNR  ( aat  produces  a  detect¬ 
able  correlation  peak  at  the  output  is  (S/Nlnun  * 

St  /Wmin2» 

From  the  experimentally  miMSured  value  of  main, 
the  correlator  should  have  had  n  sensitivity  of  - 17  dB. 
However,  experimental  results  snowed  a  sensitivity  of 
-14  dB. 

IV.  Condusion 

The  photorefractive  time-inti  /rating  processor  that 
has  been  described  has  several  <  .tvantageous  features: 
bias  removal,  increase  in  the  out  put  space-bandwidth 
product,  and  the  ability  to  direcrly  interface  the  result 
of  the  time-integrating  proces::or  with  other  optical 
systems.  Bias-free  correlation  is  desirable  because  it 
allows  us  to  increase  the  dynamic  range  and  hence  the 
sensitivity  of  time-integrating  processors.  In  the  im¬ 
plementation  described  in  tiiiii  paper,  however,  the 
square-law  detection  at  the  output  reduces  the  avail¬ 
able  overall  dynamic  range.  A  definite  improvement 
in  dynamic  range  can  be  obtained  if  the  correlation 
that  is  formed  in  the  photorefrai  tive  crystal  is  interfer- 
ometrically  detected  on  the  out  |  .ut  detector.  Another 
limitation  of  the  system  described  here  is  the  long 
integration  time  (several  secon  Is).  In  some  applica¬ 
tions  this  long  integration  time  is  desirable  and  could 


result  in  extremely  good  sensitivity  (detection  of  sig¬ 
nals  with  very  low  SNR).  However  it  is  certainly 
desirable  to  be  able  to  decrease  the  integration  time  to 
several  milliseconds.  This  could  be  accomplished  by 
increasing  the  optical  power  of  the  writing  beams,  but 
this  is  in  general  an  impractical  solution.  Another 
limitation  of  this  technique  is  the  relatively  low  dif¬ 
fraction  efficiency  that  is  obtained  with  BSO  crystals 
(2-3%),  which  r^uces  the  overall  light  efficiency. 
Materials  with  higher  electrooptic  coefficients,  such  as 
barium  titanate,  can  provide  better  efficiency;  howev¬ 
er,  the  time  constant  obtained  with  this  particular 
material  is  much  longer  than  that  of  BSO.  New  pho- 
torefractive  materials  currently  being  developed  show¬ 
ing  promise  of  a  large  improvement  in  optical  sensitiv¬ 
ity  as  well  as  higher  electrooptic  coefficients  may 
provide  a  substantial  improvement  in  performance 
and,  specifically,  reduce  the  total  optical  power  that  is 
requi^. 
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ABSTRACT 


Optical  interconnections  utilizing  volume  holography  is  described.  Intrinsic  cross-talk 
effects  that  limit  the  number  of  independent  interconnections  are  identified  and  auialyzed 
by  applying  coupled-wave  analysis.  Sampling  grids  for  removing  the  first-order  cross  talk 
are  presented  resulting  in  a  system  limited  by  second  and  third  order  cross  talk  only. 
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INTRODUCTION 


Optical  interconnecting  elements  that  exploit  free  propagating  light  waves  can 
IK>tentially  act  as  a  powerful  alternative  to  electrical  wiring  because  free  propagating 
photons  lack  the  interactive  nature  of  electrcms  [1].  Optical  interconnections  can  be 
particularly  useful  for  the  optical  implementation  of  neural  computers  [2]  in  which  each 
processing  element  is  interconnected  to  many  others  (typically  several  thousand).  As 
an  example,  a  network  that  is  capable  of  processing  images  may  consist  of  several 
million  processing  elements  (or  ‘Neurons”)  and  therefore  there  is  a  very  large  number 
of  interconnections  to  be  specified  in  such  a  system,  the  system  that  is  utilized  to 
simulate  all  these  connections  is  planar  (e.g.  electronic  or  auri  optical  system  that  utilizes 
a  planar  medium  to  specify  the  connectivity  pattern),  then  the  area  of  the  device  grows 
in  proportion  to  the  total  number  of  connections.  As  an  example,  let  us  assume  that 
the  area  required  to  record  the  strength  of  each  interconnection  is  then  the  total 

area  required  to  simulate  a  network  that  is  comprised  of  10^  connections  is  10cm  x  10cm. 
This  makes  the  fabrication  of  such  a  device  very  difficult  and  in  the  case  of  the  optical 
implementation,  the  size  of  the  optical  system  becomes  exceedingly  large.  To  overcome 
this  shortconung,  we  have  previously  proposed  [3]  a  holographic  optical  interconnection 
method  for  utilizing  a  three>dimensional  storage  medium  which  provides  a  much  higher 
storage  density.  In  this  paper,  we  derive  the  interconnection  pattern  having  minimum 
cross  talk  and  the  signal-to-noise  ratio  for  this  interconnecting  configuration. 

HOLOGRAPHIC  INTERCONNECTIONS 
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To  identify  the  fondamentel  croe»*t«Jk  effects  that  limit  the  available  number  of 
independent  interconnections,  a  global  volume  holographic  interconnection  between  N 
input  and  N  output  pixels  is  considered.  The  arrangement  we  will  be  using  is  shown 
in  Fig.  1.  The  input  and  output  pixels  are  arranged  in  planes.  A  lens  collimates  light 
from  each  input  point  and  therefore  the  light  incident  on  the  crystal  in  Fig.  1  due  to  a 
single  point  at  the  input  is  a  plane  wave  whose  propagation  direction  is  determined  by 
the  position  of  the  pixel.  Similarly,  an  output  lens  focuses  each  diffracted  plane  wave 
to  a  pixel  on  the  output  plane.  The  interconnection  between  each  pair  of  input-output 
points  is  performed  by  a  separate  grating,  with  the  strength  of  each  grating  determining 
the  weight  of  the  connection.  Each  grating  can  be  recorded  with  a  separate  exposure 
which  would  require  a  total  of  exposures.  We  can  reduce  the  number  of  required 
exposures  by  forming  N  multiple  holographic  exposures  [4]  as  follows.  One  input  point  is 
turned  on  during  each  exposure  and  the  desired  connectivity  pattern  between  the  selected 
input  point  and  all  the  output  points  is  recorded  at  the  training  plane  (see  Fig.  1).  An 
exposure  of  the  interference  pattern  between  the  two  waves  is  recorded  and  the  process 
is  repeated  for  each  of  the  N  input  points,  ff  we  neglect  diffraction  effects  at  the  crystal 
boundaries,  then  the  interconnection  pattern  consists  of  perfect  sinusoidal  gratings,  which 
include:  (l)  N{N  -  l)/2  gratings  that  are  recorded  by  the  interference  between  pixels 
that  are  simultaneously  on  at  the  training  plane  during  the  recording,  and  (2)  gratings 
connecting  input  and  output  pixels.  For  convenience,  the  former  set  of  N{N—1) /2  gratings 
are  referred  to  as  intra-layer  gratings  and  the  latter  set  of  gratings  are  described  as 
inter-layer  gratings. 
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An  independent  interconnection  ic  defined  in  inch  e  way  that  the  inteneity  Ip  of  the 
^fEracted  light  wave  at  the  output  pixel  p  is  given  by 

Ip  =  Vpili*  (1) 

where  i  is  the  index  that  represents  input  pbcels,  /,■  is  the  intensity  at  the  input  pixel  t, 
and  tipi  is  the  (h&action  efficiency  of  the  grating  generated  by  the  interference  between 
the  input  pixel  t  and  the  pixel  j/  at  the  training  plane  that  corresponds  to  the  output  pbcel 
p.  We  have  assumed  in  £q.  (1)  that  the  read>out  light  is  spatially  incoherent.  This  means 
that  the  light  intensity  reaching  each  output  point  is  a  linear  combination  of  the  light 
irUenaitiet  of  the  input  pixels  and  therefore  we  have  used  light  intensity  as  the  variable 
that  describes  the  system.  If  the  hologram  is  read-out  with  spatially  coherent  light,  then 
the  field  is  the  appropriate  variable  to  use.  The  field  is  a  complex  quantity  (has  both 
amplitude  and  phase)  and  therefore  the  coherent  case  is  generally  more  difficult  to  analyze 
and  also  implement. 

The  cross-talk  effect  in  volume  holographic  interconnections  b  defined  as  the  difference 
between  the  actual  light  intensity  Ip  obtained  at  the  pixel  p  and  the  desirable  intensity 
of  Eq.  (l).  If  we  consider  only  first-order  cross  talk  (i.e.  neglecting  the  contribution  of 
multiple  diffraction)  we  can  write  Ip  as  follows. 

Ip^^^Vpili  +  Vipjllit  (2) 

<■  Mp  » 

where  riipji  b  the  diffraction  efficiency  with  which  light  is  diffracted  from  pixel  t  to  pixel  p 
due  to  a  grating  that  was  recorded  by  pixel  j  at  the  input  and  I  at  the  output.  Coupled- 
wave  analysb  [S]  b  utilized  below  to  evaluate  those  cross-talk  effects.  For  this  purpose  small 
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diffrmctioii  efficiency  for  an  individual  grating  is  assumed.  Due  to  the  assumption  of  small 
diffraction  efficiency,  the  primary  cross-talk  effect  can  be  evaluated  by  means  of  a  first- 
order  coupled-wave  analysis.  In  such  an  analysis,  an  input  light  wave  at  pixel  t  interacts 
with  every  grating  in  the  volume  hologram  independently  and  without  an  intermediate 
rediffraction.  Diffiracted  light  waves  from  the  intra-layer  gratings  do  not  contribute  to 
cross-talk  effects  because  of  large  phase  mismatch,  and  therefore  we  only  need  to  consider 
the  first-order  cross  talk  that  results  from  the  inter-layer  gratings. 

Let  us  consider  an  output  pixel  p.  The  light  intensity  received  at  p  including  the 

first-order  cross-talk  effect  is  given  by  Eq.  (2).  The  cross-talk  diffraction  efficiency  tjipji 

calculated  from  coupled  mode  analysis  is  approximately  [5]: 

/  rijiaine^(AkipjiLl2ir)  if  (2ir/A)n<  +  K,j  =  np|l(2jr/A)n<  -h  KyjH 
to  otherwise, 

where  n,-  and  Bp  denote  xmit  vectors  in  the  direction  of  propagation  from  the  input  pixel 
i  and  towards  the  output  pixel  p,  respectively.  L  is  the  thickness  of  the  crystal  and  Ak,py{ 
denotes  the  phase  mismatch  for  the  interaction  between  the  grating  Kyi  that  has  been 
recorded  for  interconnecting  point  j  to  point  I  and  the  optical  wave  emanating  from  the 
input  pixel  i  and  it  is  given  by 

=  II (2ir/ A) (n<  -  Bp)  +  Kyi  || .  (4) 

A  is  the  optical  wavelength  in  the  crystal. 

FIRST  ORDER  CROSS  TALK 

The  first-order  cross  talk  can  be  eliminated  if  one  can  arrange  input  and  output  pixels 
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•o  that  all  the  N*  gratings  in  E!q.  (2),  except  for  the  grating  K<p  yielding  the  signal,  belong 
to  one  d  the  two  types  of  gratings  defined  bekm.  The  first  type  is  characterized  by  the 
condition  that  the  phase  mismatch  given  by  Eq.  (4)  is  larger  than  2s-/L,  in  which  case  the 
diffracted  light  intensity  is  very  small;  for  these  gratings,  the  first-order  cross-talk  effect 
can  be  neglected.  The  second  type  consists  of  gratings  for  which  the  diffracted  light  waves 
do  not  propagate  to  any  one  of  the  output  pixels  used  for  the  interconnection  in  which 
case  from  Eq.  (3)  we  have  that  riipji  —  0  and  hence  such  gratings  do  not  contribute  any 
light  intensity  at  the  pixel  p  through  first-order  cross  talk.  To  derive  an  arrangement  that 
will  ensure  that  all  the  recorded  gratings  satisfy  one  of  the  two  conditions  stated  above, 
we  note  that  the  phase  mismatch  described  by  Eq.  (4)  is  determined  by  the  geometry  of 
the  input  and  output  pixels.  The  wave  vector  diagram  is  drawn  in  Fig.  2,  where  k,-  and  ky 
refer  to  the  input  wave  vectors  and  kp  and  ki  are  the  output  wave  vectors.  The  condition 

_  k,+Ky; 

states  that  the  unit  vector  np  is  in  the  direction  of  the  vectorial  sum  of  the  input  vector  k, 
and  the  grating  vector.  This  indicates  that  the  grating  Ky{  is  a  grating  of  the  first  type, 
being  capable  of  diffracting  light  from  t  to  p  unless  it  is  phase  mismatched.  Therefore, 
once  a  pair  (t,  p)  is  selected  it  is  imperative  that  ail  the  remaining  points  (j,  /)  are  selected 
such  that  if  Eq.  (5)  is  satisfied  then  ^kipji  is  bigger  than  2x/L.  The  degeneracy  condition 
that  must  be  avoided  is 

Ak.py,  =  ||k<  +  Ky,  -  kplll  <  27r/L  (6) 

This  condition  specifies  two  strips  on  the  k-space  sphere  as  shown  in  Fig.  2.  The  two 
strips  are  parallel  circles  on  the  wavenormal  sphere.  The  planes  in  which  the  strips  lie  are 
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perpendicular  to  the  grating  vector  K/|.  If  we  select  an  input-output  pair  (t,p),  then  if  an 
additional  input  point  j  b  outside  the  bottom  strip  in  Fig.  2  it  will  not  produce  cross  talk 
to  point  p;  if  y  b  within  the  bottom  strip  then  cross  talk  will  be  eliminated  if  an  output 
point  b  not  placed  at  the  same  location  as  I  along  the  top  strip.  If  these  two  criteria  can  be 
met  for  all  input  and  output  pixeb,  then  first-order  cross  talk  b  completely  eliminated.  The 
required  width  of  the  strip  in  Fig.2  b  determined  by  several  factors  including  diffraction 
due  to  transverse  aperture  of  the  hologram,  an  effect  we  have  not  considered  in  thb  paper. 
The  principal  factor  detenmning  the  width  b  the  angular  sensitivity  of  diffraction  from  a 
thick  grating  which  b  determined  by  the  thickness  of  the  crystal.  The  width  of  the  strip 
that  b  required  to  satisfy  Eq.  (6)  can  be  approximated  for  the  purposes  of  thb  simplified 
exposition  by  2rfLain0,  where  0  b  the  angle  between  and  kp.  Thb  estimate  b  fotmd 
by  determining  the  angular  deviation  of  the  incident  and  diffracted  from  the  ideal  Bragg 
condition,  that  will  make  rupij  s  0  (see  Eq.(3)). 

In  the  above  discussion  we  have  specified  the  conditions  that  must  be  met  so  that  each 
grating  implements  an  independent  interconnection  in  the  crystal.  The  remaining  task  b 
to  specify  the  arrangement  of  input  and  output  pixeb  in  the  geometry  of  Fig.  1  so  that  the 
stated  conditions  are  satbfied.  We  have  developed  an  entire  family  of  sampling  patterns 
that  accomplbh  thb  goal  [3].  Shown  in  Fig.  3  b  one  such  sampling  pattern  for  the  input 
and  output  planes.  To  see  why  thb  b  the  case  consider  first  the  gratings  connecting  two 
input  points  along  the  same  row  to  two  points  at  the  same  row  at  the  output.  These 
gratings  can  never  be  parallel  to  each  other  (i.e.  fall  within  the  same  strip)  because  the 
horizontal  (z  direction  in  Fig.l)  difference  in  position  between  the  input  and  the  output 
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locatkma  ia  guaranteed  to  be  different.  If  we  consider  two  a4jacent  points  in  the  same 
column  at  the  input  being  connected  to  two  adjacent  points  in  the  same  column  at  the 
output,  then  we  find  that  the  two  gratings  coxmecting  them  are  tilted  with  respect  to  each 
other  in  the  y—z  plane  (see  the  geometry  of  Fig.  1).  In  general,  gratings  connecting  points 
that  are  neither  at  the  same  row  or  colusm  have  gratings  that  are  tilted  with  respect  to 
each  other  in  all  three  directions.  The  patterns  in  this  example  are  drawn  on  a  9  x  9  =  81 
rectangular  grid  and  only  9*^^  =  27  points  are  utilized  as  input  and  output  points  in  the 
input  and  output  planes,  resulting  in  a  total  number  of  connections  9^  =  729.  In  general,  if 
the  number  of  points  available  on  a  2-0  rectangular  grid  is  5^,  then  the  number  of  pixels 
that  are  used  for  placement  of  neurons  must  he  N  <  in  order  to  ensure  that  the 
first-order  cross  talk  can  be  eliminated.  Equivalently,  if  we  wish  to  have  N  units  in  the 
input  or  output  plane,  then  the  number  of  resolvable  points  available  must  be 

HIGHER  ORDER  CROSS  TALK 

Second-order  cross-talk  effects  result  from  light  waves  that  are  first  diffracted  by  a 
grating  from  an  input  wave  at  pixel  t,  and  then  rediffracted  by  a  second  grating  and  is 
directed  to  the  output  pixel  p.  Therefore,  two  gratings  are  needed.  All  the  second-order 
light  waves  resulting  from  diffraction  by  two  intrarlayer  gratings  or  two  inter-layer  gratings 
are  negligible  because  in  the  geometry  of  Fig.  1  they  are  phase  mismatched  and  thtis  they 
do  not  contribute  to  second-order  cross-talk  effects.  Therefore,  the  principal  source  of 
second-order  cross  talk  is  diffraction  from  the  inter-layer  gratings  followed  by  rediffraction 
from  the  intrarlayer  gratings.  Consider  again  an  output  pixel  p  receiving  light  from  am  input 
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pixel  t  not  directly  through  di&action  by  the  grating  but  through  the  intermediate 
step  of  diffraction  first  by  an  inter-layer  grating  Kij  followed  by  rediffraction  by  the  grating 
Ky,.  This  is  depicted  in  a  k-space  diagram  in  Fig.  4a.  Each  input  wave  diffracts  light  to 
all  output  pixels  through  inter-layer  gratings  and  at  least  N  pixels  are  exactly  Bragg 
matched  to  the  p>th  pixel  through  rediffraction  of  the  intra-layer  gratings.  Assuming  that 
the  overall  diffraction  efficiency  is  small  and  therefore  neglect  the  depletion  of  the  incident 
beam,  we  can  easily  calculate  the  second-order  signal  to  noise  ratio  {SNR2),  defined  as  the 
ratio  of  the  intensity  received  at  each  output  pixel  due  to  the  direct,  first-order  diffraction, 
divided  by  the  total  intensity  received  due  to  double  diffraction: 


(7) 


In  the  above  equation  is  the  average  diffraction  efficiency  for  an  intrarlayer  grating.  From 
Eq.  (7)  we  see  that  it  is  desirable  to  minimise  the  strength  of  the  intra-layer  gratings  to 
eliminate  the  second-order  cross  talk.  This  can  be  accomplished  by  selecting  a  holographic 
recording  medium  in  which  low  spatial  frequencies  are  recorded  weakly.  This  is  for  instance 
typical  of  gratings  recorded  in  photorefractive  crystals  in  the  absence  of  an  applied  electric 
field,  in  which  case  the  recording  is  done  principally  by  diffusion  of  the  carriers.  In  this 
case,  gratings  whose  period  b  considerably  longer  than  the  diffusion  length  are  not  recorded 
effectively.  As  an  example,  if  KNbOs:  Fe  300  ppm  b  utilized  [6],  the  diffraction  efficiency 
for  .Zfim  fringe  spacing  b  more  than  three  orders  of  magnitude  larger  than  the  diffraction 
efficiency  for  a  fringe  spacing  of  2.6nm.  Hence  if  the  arrangements  of  input  and  output 
pixeb  are  chosen  such  that  the  spatial  frequency  of  the  inter-layer  gratings  b  much  higher 
than  that  of  the  intra-layer  gratings,  then  the  effects  of  intra-layer  gratings  can  potentially 
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be  made  negligible  compared  to  third-ord^  cross-talk  effects,  which  we  consider  next. 


Third-order  cross  talk  arises  when  light  originating  from  the  t-th  pixel  is  diffracted 
by  three  separate  inter-layer  gratings  and  is  ultimately  directed  at  the  output  pixel  p.  In 
order  to  calculate  the  total  amount  of  third-order  cross  talk  we  need  to  determine  the  total 
number  of  three  Bragg  matched  inter-layer  gratings  whose  vectorial  sum  is  equal  to  K,-p. 
An  example  of  this  condition  is  depicted  in  Fig.  4b.  The  input  beam  in  the  direction  of  the 
>-th  pixel  is  Bragg  matched  to  grating  [7]  and  similau'ly,  a  beam  diffracted  towards 
the  1-th  output  pixel  is  Bragg  matched  (and  therefore  rediffracted  by)  gratings.  The 
ratio  of  the  intensities  due  to  first  and  third-order  diffraction  is 


SNRz  = 


36 


(8) 


where  ri2  is  the  average  diffraction  efficiency  of  an  inter-layer  grating.  The  conclusion  that 
we  might  draw  from  Eq.(8)  is  that  as  the  network  becomes  larger  (i.e.  N  increases)  the 
signal-to-noise-ratio  deteriorates  and  therefore  third  order  cross  talk  imposes  a  limit  on 
N.  In  fact,  ^3  =  riofN^  [4]  where  rio  »  Iva  the  diffraction  efficiency  obtained  when  only 
a  single  grating  is  recorded  in  the  crystal.  Substitution  into  £q.(8)  reveals  that  SNRz 
is  proportional  to  which  implies  that  for  large  networks  third  order  crosstalk  is  not 
expected  to  be  a  serious  concern. 


CONCLUSION 

We  have  used  coupled  mode  analysis  to  derive  a  simple,  approximate  result  for  the 
conditions  that  must  be  met  in  order  for  each  grating  that  is  recorded  in  a  volume  hologram 
to  implement  an  independent  interconnection  between  two  points  in  space.  Since  the 
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number  of  gratings  that  can  be  stored  in  a  volume  medium  is  in  the  order  of  VJX*  [8] 
where  V  is  the  volume  of  the  crystal  and  A  is  the  wavelength,  the  result  reported  here 
can  make  possible  the  design  of  optical  networks  with  extremely  high  storage  density.  The 
effects  of  second  and  third-order  diffraction  were  calculated  and  it  is  shown  that  these 
effects  can  impose  a  limit  on  the  number  of  units  that  can  be  interconnected  with  the 
same  crystal,  since  the  signal  to  noise  ratio  decreases  monotonically  as  N  increases.  There 
are  of  course  several  other  factors,  beyond  the  basic  geometric  constraints  treated  in  this 
paper,  which  need  to  be  taken  into  consideration  in  order  to  gain  a  complete  understanding 
of  the  capabilities  of  volume  holograms  for  implementing  global  interconnections.  Most 
significantly,  the  effects  of  the  recording  mechanism  and  the  limitations  it  imposes  on  the 
niimber  of  interconnections  that  a  single  hologram  can  implement  [4]  must  be  addressed  and 
combined  with  the  results  reported  here.  This  will  be  the  subject  of  a  future  publication. 
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FIGURE  CAPTIONS 


Fig.  1  Illustration  of  the  proposed  interconnection  scheme  between  an  input  point  t  and  an 
output  point  p.  One  grating  is  stored  by  interfering  the  two  beams  coming  from 
point  sources  i  and  p'  after  passing  through  a  Fourier  transforming  lens.  Point  p' 
is  the  inverted  image  oi  point  p.  After  storing  the  grating,  light  coming  from  point 
t  is  diffracted  by  the  giating  and  focused  on  point  p.  Therefore,  the  stored  grating 
interconnects  points  t  and  p. 

Fig.  2  k-space  diagram  illustrating  the  degeneracy  of  the  gratings  that  connect  points  (t,  p) 
and  (j,  1). 

Fig.  3  Sampling  patterns  on  9  x  9  rectangular  grids. 

Fig.  4  Wave-vector  matching  diagram  illustrating  the  mechanism  through  which  a)  second 
and  b)  third  order  cross  talk  is  introduced  at  each  output  pixel  p. 
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Multilayer  optical  learning  networks 

Kaivin  Wagner  and  Oameiri  Psaltis 


A  new  approach  to  laaming  in  a  multilayer  optical  neural  network  baiad  on  hdographically  intarconnacted 
nonlinear  davicaa  ia  praaantad.  The  propoaad  network  can  learn  the  interconnectimiB  that  form  a  diatributed 
rapraaentation  of  a  desired  pattern  transformation  operation.  The  interconnections  are  formed  in  an 
adaptive  and  self-aligning  fashion  as  volume  holographic  gratings  in  photorefractive  crystals.  Parallel  arrays 
of  globally  space-intagrated  inner  products  difimeted  by  the  interconnecting  hologram  illuminate  arrays  of 
nonlinear  Fabry-Perot  etalona  for  fast  threaholding  of  the  transformed  patterns.  A  phase  conjugated 
reference  wave  interferes  with  a  backward  propagating  error  signal  to  form  holographic  interference  patmms 
which  an  time  integrated  ia  the  volume  of  a  photonfractive  crystal  to  modify  slowly  and  learn  the 
appropriate  self-aligning  interconnections.  This  multilayer  system  performs  an  approximate  implementa¬ 
tion  of  the  backpn^Mgation  learning  procedun  in  a  mamively  parallel  high-speed  nonlinear  optical  network. 


L*— a - a - a* - 
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There  has  been  considerable  interest  in  the  optics 
community  in  recent  years  in  the  optical  implementa¬ 
tion  of  neural  network  models,^~‘  and  ^ese  have  been 
considered  principally  for  associative  memory  i^iplica- 
tions.^^^  ^coherent  optoelectronic  implementations 
of  matrix  vector  multipliers  with  nonlinear  electrical 
feedback  were  used  to  demonstrate  that  imperfect  an¬ 
alog  hardware  worked  surprisingly  weU  in  the  robust 
environment  of  a  neural  networ£^  Holographic  asso- 
.  dation  with  coherent  light  can  be  combined  with  opti¬ 
cal  nonlinearities  within  a  strongly  pumped  phase  con¬ 
jugate  mirror, or  with  the  nonUnear  thresholding 
capabilities  of  an  optical  spatial  light  modulator,*  to 
implement  image  association.  Volume  holograms  can 
be  repetitively  exposed  to  a  number  of  Bragg  angle 
multiplexed  connectivity  patterns  to  produce  a  holo¬ 
graphic  interconnection  matrix.^^  These  systems  are 
pit^ammed  to  perform  a  fixed  operation  by  precalcu¬ 
lating  the  interconnections  with  an  easy  learning  pro¬ 
cedure,  so  that  fixed  points  of  the  idealized  neural 
dynamics  are  the  desir^  associative  recall  One  of  the 
most  intriguing  properties  of  a  neural  network  is  the 
ability  to  learn  dynamically  the  interconnections  that 
correspond  to  a  desired  behavior  through  an  iterative 
adaptation  of  the  weight  matrix  through  outer  product 
perturbations. Optical  implementations  of  adap¬ 
tive  associative  memories  using  optoelectronic  compo¬ 
nents  and  spatial  light  modulator  technology  have 
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been  suggested.*^  A  fascinating  all-optical  nonlinear 
dynamical  system  for  adaptive  association  based  on  a 
saturating  cubic  nonlinearity  in  a  phase  conjugating 
dynamic  volume  holographic  resonator  has  bMn  pro¬ 
posed.'*  An  even  more  powerful  learning  paradigm, 
sometimes  called  hard  learning,  involves  either  error 
driven  learning,  reinforcement  learning,  or  self-orga¬ 
nizing  principles.^**  A  hybrid  electrooptical  approach 
to  Boltzmann  learning  has  been  pr opo^  tbat  is  based 
on  an  incoherent  optoelectronic  matrix-vector  multi¬ 
plier  interfaced  with  a  microcomputer.'*  Error  driven 
behavioral  modification  has  the  ability  to  sense  system 
performance  and  adapt  the  synaptic  weights  in  a  man¬ 
ner  which  will  compensate  for  some  of  ^e  device  im¬ 
perfections  and  interconnection  misprogrammings 
that  caused  tbe  unwanted  behavior.  This  paper  ex¬ 
plores  tbe  match  between  the  backpropagation  error 
driven  multilayer  learning  procedure'**  and  optical 
networks,'**'*  while  ignoring  the  biological  implausi- 
bility  of  bidirectional  synapses,  because  of  the  intrinsic 
bidirectionality  of  optical  interconnections.  This  sys¬ 
tem  is  a  feed  forward  multilayer  perceptron  which  has 
the  potential  of  more  general  computationally  univer¬ 
sal  behavior  than  single-layer  associative  networks. 
However,  it  differs  from  the  recurrent  networks  be¬ 
cause  all  the  feedback  dynamics  are  involved  in  train¬ 
ing  the  modifiable  interconnections  and  not  in  pro¬ 
cessing  the  input.  We  propose  a  new  optical  imple¬ 
mentation  of  this  multilayer  learning  sjrstem  which 
uses  self-aligning  volume  holc^ams  to  bidirectionally 
interconnect  nonlinear  etalons  which  act  as  the  bidi¬ 
rectional  optical  neurons.  This  architecture  combines 
the  robustness  of  the  distributed  neural  computation 
and  the  backpropagation  learning  procedure  with  the 
high  speed  processing  of  nonlinear  etalons,  the  self¬ 
aligning  ability  of  phase  conjugate  mirrors,  and  the 
massive  storage  capacity  of  volume  holograms  to  pro- 
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Fig.  1.  Optical  backward  error  propagation  architecture  with  po¬ 
larization  multiplexed  forward  and  backward  waves,  nonreciprocal 
polarization  filtering,  and  self-aligning  polarization  switching  vol¬ 
ume  hologram. 


duce  a  powerful  and  flexible  parallel  optical  processor. 

One  version  of  a  single  layer  of  this  optical  back- 
propagation  architecture  is  shown  in  Fig.  1,  and  the 
operation  is  briefly  described  brfore  discussing  the 
idealized  backpropagation  algorithm  and  the  details  of 
this  optical  implementation.  The  learning  algorithm 
in  this  single-layer  optical  perceptron  b^iina  with  the 
repetitive  presentation  to  flie  network  input  of  the  set 
of  training  patterns  in  a  uniformly  random  sequence. 
Initially,  ^e  system  gives  rise  to  a  sequence  of  output 
patterns  through  the  holographic  interconnection  and 
output  nonlinearity,  whi^  is  different  from  the  de¬ 
sired  target  response  sequence.  An  error  pattern  is 
formed,  either  electronic^y  or  optically,  by  takmg  the 
difference  between  the  actual  output  pattern  and  the 
targeted  response.  The  difference  pattern  is  sent 
backward  through  the  output  neurons  and  into  the 
network  using  the  same  etalons  and  holographic  inter¬ 
connections,  but  encoding  the  error  with  an  orthogonal 
polarization,  or  a  slightly  different  frequency,  or 
pulsed  at  a  jittered  time  than  the  forward-propagating 
signal  This  multiplexing  of  the  forward  and  back¬ 
ward  waves  in  orthogonal  eigenmodes  avoids  direct 
interference  between  these  waves.  Meanwhile,  the 
undiffracted  portion  of  the  input  pattern  is  phase  con¬ 
jugated  by  an  auxiliary  phase  conjugate  mirror,  which 
retroreflects  each  component  of  the  input  wavefront 
back  toward  the  position  at  the  input  from  which  it 
originated.  The  phase  conjugate  beam  has  the  polar¬ 
ization  rotated  or  the  wavelength  shifted  to  match  the 
error  encoding  to  act  as  a  self-aligning  reference  beam 
for  the  backward-propagating  error  wavefront.  A  vol¬ 
ume  hologram  is  recorded  within  the  photorefractive 
crystal  as  the  interference  pattern  between  the  phase 
conjugated  input  pattern  and  the  backward  propagat¬ 
ing  error  signal.  This  is  mathematically  equivalent  to 
changing  the  holographic  connectivity  matrix  by  the 
outer  product  of  signs!  and  error  pattern  vectors.  The 
next  time  that  this  particular  input  pattern  is  present¬ 


ed  to  the  network,  it  produces  a  di^action  pattern 
that  more  closely  resembles  the  desired  output  pat¬ 
tern.  Eventually,  the  hologram  will  learn  t^  corre¬ 
spondence  between  a  set  of  input  patterns  and  the 
associated  responses  as  long  as  the  set  of  input  patterns 
is  linearly  separable,  which  implies  that  a  holographic 
interconnection  exists  that  produces  the  desii^  pat¬ 
tern  transformation.  Since  the  holographic  reference 
wave  is  generated  by  a  phase  conjugate  mirror,  as  the 
network  learns  it  will  also  self-aliipi  as  well  as  correct 
for  some  of  the  optical  imperfections  present  in  the 
system  components. 

When  the  desired  pattern  transformation  is  not  lin¬ 
early  separable,  as  in  most  difficult  problems  of  inter¬ 
est,  it  is  necessary  to  adaptively  implement  more  com¬ 
plex  nonlinear  decision  surfaces.^''  One  way  that  this 
can  be  accomplished  is  by  stacking  these  single-layer 
networks  up  to  form  a  multilayer  network  of  holo¬ 
graphically  interconnected  nonlinear  devices  that  is 
trainable  by  backpropagating  the  error  signal  through 
the  layers.  When  the  error  pattern  strikes  the  holo¬ 
gram,  part  of  it  is  diffracted  toward  the  previous  layer 
of  nonlinear  devices,  known  as  hidden  units,  by  the 
transpose  of  the  interconnection  matrix  seen  by  the 
forward-propagating  patterns,  which  is  the  necessary 
connectivity  for  backpropagating  the  error.  The 
backpropagation  algorithm  also  requires  that  the 
transmission  fimction  of  the  hidden  units  to  back¬ 
ward-propagating  signals  be  the  derivative  of  the  for¬ 
ward  mode  sigmoid  transfer  function  evaluated  at  the 
current  operating  level  of  each  device.  The  derivative 
is  peaked  where  the  nonlinear  sigmoid  transfer  charac¬ 
teristic  has  a  large  differential  gain,  so  that  if  the 
hidden  unit  is  operating  in  this  region  the  connections 
leading  to  it  will  be  strongly  modified  by  the  efficiently 
transmitted  error  signal,  thereby  helping  that  neuron 
to  decide  that  it  should  be  either  high  or  low  on  subse¬ 
quent  presentations  and  not  between.  The  multiple 
layers  of  interconnections  will  be  continuously  modi- 
fi^  until  all  the  patterns  within  the  training  set  pro- 
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Fig.  2.  TVo-layer  network  for  backpropagation  learning,  feed  for¬ 
ward  equations,  backward-errur-prnpagation  equations,  and  learn¬ 
ing  rule. 
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duoe  outputs  v«ry  iMar  the  flat  upper  or  lower  levels  of 
the  lumliiiear  device  sigmoid  response,  so  that  the 
error  signals  are  not  allov^  to  bacl^ropagate  through 
the  network.  When  convergence  is  re^ed,  the  error 
signals  that  are  generated  at  the  flnal  layer  become 
very  small  for  all  members  of  the  training  set 

I.  BadcpiopagaMon  LaamlnQ  Procedure 
In  this  section  we  briefly  review  the  derivation  of  the 
backward  error  propagation  learning  procedure**^  to 
establish  the  notation  and  encapsulate  the  system 
characteristics  that  the  optical  ar^tecture  must  in¬ 
corporate.  A  schematic  representation  of  a  two-layer 
network  is  shown  in  Fig.  2,  which  consists  of  an  input 
layer  globally  interconnect  to  a  hidden  layer,  which 
is  interconnected  through  a  second  weighted  commu¬ 
nication  network  to  an  output  layer.  The  interconnec¬ 
tion  strengths  are  modiflable,  so  that  the  system  can  be 
trained  to  perform  a  desired  pattern  transformation 
from  the  input  space  to  the  output  space.  The  binary 
signals  applied  to  the  input  layer  of  Ni  neurons  are 
reproduce  at  the  output  of  these  neurons  as  binary 
outputs,  which  are  the  inputs  to  the  first  layer,  so  that  U 
*  The  outputs  of  the  first  layer  are  interconnect¬ 

ed  through  an  ATa  X  iVi  weight  matrix  wjP  to  a  hidden 
layer  consisting  of  neurons,  forming  presynaptic 
input  strengths  which  are  linear  combinations  of  the 
outputs  from  the  previous  layer: 

(1) 

The  hidden  layer  of  neurons  performs  a  soft  threshold¬ 
ing  operation  on  these  presynaptic  inputs,  with  a  non¬ 
linear  sigmoid  response  fit),  forming  ^e  outputs  of  the 
hidden  layer  which  become  the  inputs  to  ^e  second 
layer 

(2) 

The  outputs  of  the  hidden  layer  are  interconnected 
through  the  N3  X  N-i  weight  matrix  wlf/,  which  gives 
the  N3  presynaptic  network  input  to  the  final  output 
layer;  as  a  linear  combination  of  the  hidden  layer  out¬ 
puts: 

"7 
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The  final  layer  performs  the  same  nonlinear  soft 
thresholding  operation  as  the  hidden  layer  giving  the 
N3  network  outputs: 

<o}«  j .  (4) 

These  outputs  represent  the  response  of  the  network 
for  a  given  set  of  inputs  i,-,  and  it  is  the  job  of  the 
training  procedure  to  modify  the  interconnection 
weight  matrices  so  that  the  actual  response  closely 
approximates  the  desired  system  response.  Not  all 
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inputp^utput  mappings  are  possible  in  a  network  of  a 
specific  size,  but  complex  problems  of  a  cognitive  na¬ 
ture  with  fuzzy  decision  boundaries  have  been  effi¬ 
ciently  performed  in  a  multilayer  network  of  this 
type.** 

The  desired  response  for  the  input  (,(n),  presented  at 
the  input  of  the  network  on  the  nth  machine  cycle,  is 
given  by  a  target  vector  t*(n),  which  differs  from  the 
network  output  Ok(n),  so  ^  network  error  vector  is 
given  by  6*(n)  ■  (t*(n)  -  Oik(n)].  A  positive  definite 
mean-squares  error  (MSE)  energy  fiinctional  can  be 
formed  to  characterize  the  systems  behavior,  and  mini¬ 
mizing  this  function  for  all  n  will  improve  the  quality  of 
the  behavior  of  the  multilayer  network: 


•  "t 

(5) 

kml 

A  gradient  descent  procedure  can  be  employed  to  mod¬ 
ify  the  elements  of  the  weight  matrices  and  push  them 
in  the  direction  that  improves  the  network  perfor¬ 
mance,  as  measured  by  the  MSE  energy  function,  on 
subsequent  presentations  of  a  given  pattern: 


BE 


(6) 


This  weight  update  rule  is  designed  to  move  the 
weights  in  a  direction  that  rolls  down  the  gradient  of 
the  energy  surface  in  an  amount  which  is  proportional 
to  the  local  slope.  Ideally,  the  energy  function  should 
be  averaged  over  the  entire  set  of  training  patterns,  so 
that  the  modification  of  the  weight  matrices  is  in  the 
appropriate  direction  to  improve  the  system  response 
for  the  entire  training  set  However,  a  temporally 
localized  learning  can  be  performed  by  using  a  small 
acceleration  coefficient  r/  and  modifying  the  weights 
after  individual  pattern  presentations.  Tlie  modifica¬ 
tion  of  the  weights  that  results  after  cyclically  present¬ 
ing  the  training  set  in  arbitrary  order  many  times  can 
approximate  the  desired  change.  The  gradient  de¬ 
scent  is  calculated  by  using  the  chain  rule  and  repre¬ 
senting  the  derivative  of  the  energy  function  with  re¬ 
spect  to  the  weight  matrix  elements  as  a  product  of  two 
parts,  the  backpropagating  error  and  the  forward- 
propagating  signal: 


dE 

aui,7* 


(7) 


The  derivative  of  the  energy  with  respect  to  the  pre- 
syimptic  input  to  the  mth  Uyer  is  defined  to  be 
which  is  the  backpropagating  error  signal  in  that  layer. 
In  the  final  layer  this  term  is  similar  to  the  standard 
form  of  a  least-mean-squares  (LMS)  error  signal,  as 
originally  derived  for  the  single-layer  Adaline:** 


(8) 


The  first  term  is  found  by  directly  differentiating 
the  energy  function,  which  yields  the  standard  error 
signal  used  in  adaptive  filters,  and  the  second  term  is 
found  by  differentiating  the  nonlinear  response  of  the 
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Fig.  3.  Bidirectional  neuron  for  backpropagation,  its  forward- 
mode  saturating  nonlinearity,  and  magnified  derivative. 


neurons.  The  significance  of  Eq.  (9)  is  that  to  trans¬ 
late  the  /tth  component  of  the  output  error  vector  back 
into  the  final  layer  of  the  network  it  must  simply  be 
multiplied  by  a  value  which  is  locally  computable  with¬ 
in  the  kth  output  neuron.  Thus  ^e  network  output 
error  function  da  *  (tk  oa)  can  be  sent  back  through 
the  corresponding  output  layer  neurons,  which  multi¬ 
ply  the  error  component  by  the  derivative  of  the  non¬ 
linear  sigmoid  response  at  the  current  operating  level 
of  that  output  neuron.  The  error  signal  which  is  used 
to  program  the  weights  of  the  final  layer  is  propagated 
back  through  those  weights  by  multiplying  by  w^/,  and 
all  the  appropriately  weighted  error  signals  converging 
on  the  Jth  hidden  neuron  are  summed  to  form  a  back- 
propagating  presynaptic  network  input.  The  weight¬ 
ed  sum  of  ^e  error  functions  transmitted  in  the  back¬ 
ward  direction  by  the  final  layer  is  computed  using  the 
same  interconnection  matrix  seen  in  t^  forward  pro¬ 
cessing  mode,  but  summing  over  the  Ns  output  neu¬ 
rons  using  the  transpose  of  the  matrix  which  is  used  for 
the  forw^-propagating  interconnection: 

'  ao}»  a»}» 
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(9) 


This  represents  an  iterative  algorithm  for  successively 
computing  the  error  function  at  deeper  layers  back 
toward  the  beginning  of  the  network  in  terms  of  the 
error  fimction  inject^  back  into  the  final  layer.  Al¬ 
ternatively,  this  algorithm  can  be  considered  to  repre¬ 
sent  a  wavofiont  that  backpropagates  through  the  net¬ 
work,  multiplying  by  the  weights,  accumulating  at  the 
neurons,  ai^  multiplying  by  the  neurons  backward 
transmittance  to  compute  the  appropriate  error  to. 
program  the  previous  layer.  The  network  is  highly 
noi^ear  in  the  forward-propagating  direction,  but 
the  backpropagating  wave^nt  is  computed  using  only 
linear  operations. 

The  neurons  must,  therefore,  have  two  signal  path¬ 
ways  as  shown  in  Fig.  3.  The  two  pathways  sham  the 
same  weights  on  the  connected  layers,  but  the  neuron 


response  is  a  nonlinear  soft  thresholding  for  forward- 
propagating  signals  and  a  multiplier  that  only  allows  a 
badcward  propagating  error  through  the  neuron  when 
the  slope  of  the  forwai^  mode  operation  is  large.  The 
transmitted  components  of  the  backpropagating  error 
vector  are  only  large  when  the  corresponding  output 
neurons  are  operating  in  the  steep  thn^olding  regime 
where  the  derivative  is  large,  and  that  component  was 
significantly  in  error  at  the  network  output.  Any  neu¬ 
ron  that  had  decided  that  it  is  a  one  or  zero  by  being 
well  above  or  below  the  threshold  knee  inefficiently 
transmits  the  error  back  into  the  previous  layer  of  the 
network.  From  the  definition  of  the  change  in  the 
weight  matrix  given  in  Eq.  (6)  and  the  chain  rule  ex¬ 
pansion  of  Eq.  (7)  we  can  write  the  form  of  the  weight 
update  rule  for  tiie  mth  layer  according  to  this  first- 
order  gradient  descent  procedure: 

+  1)  »  ttil,7'(n)  +  (10) 

The  error  transmitted  by  the  neurons  back  into  the 
previous  layer  of  interconnections  is  used  to  modify 
the  weights  of  that  layer  through  this  outer  product 
update  rule.  The  weighted  interconnection  is 
carrying  the  output  from  the  yth  neuron  in  the  mth 
layer  to  the  kth  neuron  in  the  (m  +  l)st  layer, 
wUch  IS  simultaneously  broadcasting  the  error  func¬ 
tion  back  into  the  mth  layer  of  tim  network.  The 
product  of  this  forward-propagating  signal  and  back¬ 
ward-propagating  error  takes  place  within  each 
weighted  ssmaptic  connection  as  the  desired  weight 
update  contribution,  completely  independently  of 
what  is  taking  place  within  all  the  other  weighted 
connections,  and  this  is  the  only  information  needed  to 
update  that  weight,  so  this  learoing  rule  can  be  said  to 
be  a  local  update  rule.  The  training  procedure  for  the 
final  layer  of  weights  is  given  by  an  appropriate  outer 
product  learning  rule,  wUch  is  a  local  update  rule  that 
takes  place  wit^  each  weighted  signal  patiiway,  but 
the  problem  of  credit  assignment  of  the  MSE  energy  to 
the  earlier  layer  has  been  solved  by  nonlocally  Wk- 
propagating  the  error  vector.  This  is  referred  to  as  the 
backvwd  error  propagation  algorithm  for  training 
multilayer  networks,  and  it  can  be  further  generalized 
to  N  layer  networks  or  networks  with  feed  forward 
interconnections,  e.g.,  when  the  first  layer  connects 
directly  with  the  output  layer  as  well  as  indirectly 
through  the  hidden  layer  to  the  output  layer.  For 
more  details  of  the  derivation,  operation  and  utility  of 
this  multilayer  network  training  algorithm  the  reader 
is  referred  to  Refs.  1, 2,  and  18. 

■  Ondcal  knolenieiilalkm 

The  optical  implementation  of  a  backpropagation 
network  requires  two  basic  bidirectional  components, 
the  interconnection  matrices,  and  the  nonlinear  units. 
Volume  holograms  appear  as  the  most  promising  can¬ 
didate  for  implementation  of  an  interconnection  ma¬ 
trix  because  of  the  large  storage  capacity  possible  with¬ 
in  the  volume  of  a  crystal  and  the  dynamic  response 
possible  with  a  photorefractive  crystal  The  readout 
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oi  a  volume  hologram  can  be  accompliahed  with  either 
a  forward-propagating  beam  or  a  backward-propaga¬ 
tion.  Spatial  li^t  mi^ulators  (SLMs)  could  alM  be 
uaed  aa  the  interconnection  elment  for  small  net- 
worka,  but  they  would  have  to  be  both  bidirectional 
and  opticaUy  addressable  to  be  uaed  in  a  backprf4>aga- 
tion  network.  In  this  paper  a  new  aelf-ali|piing  ap¬ 
proach  to  adaptively  forming  optical  interconnections 
based  on  phaM  conjugating  one  of  the  undiffracted 
beams  is  presented.  This  techniqiu  uses  interfero¬ 
metric  detection  in  the  volume  of  a  pbotorefractive 
crystid  to  accomplish  all  the  outer  product  multiplica¬ 
tions  necessary  for  weight  matrix  perturbation. 
These  Ni  X  N2  weight  updates  are  calculated  in  paral¬ 
lel  exposing  the  crystal  with  Ni  phase  conjugated 
collapsing  spherical  waves  and  Ni  expanding  spherical  - 
waves  simultaneously. 

The  nonlinear  units  or  neurons  need  to  threshold  the 
forward-propagating  beam  while  transmitting  the 
backward  beam  only  when  the  forward  beam  nonlin¬ 
earity  is  in  the  high  slope,  or  undecided,  regime  of 
operation.  A  special  purpose,  bidirectional,  detector 
modulator  pair  array  structure  could  be  tiulored  to 
generate  the  desired  backpropagation  neuron  respons¬ 
es  by  utilizing  the  appropriate  integrated  electronic 
circuitry,  but  the  individual  neurons  could  become 
quite  complicated  with  this  conventional  optoelec¬ 
tronic  integrated  circuit  approach.  Appropriately 
modified  transmissive  spatial  light  modulators  might 
be  considered  for  backpropagation  neurons,  and  one 
possible  structure  of  this  type  is  illustrated  in  Fig.  4. 
In  this  type  of  birefringent  SLM,  crossed  polarizers  are 
placed  on  either  side  of  an  electrooptic  medium  which 
is  optically  addressed  by  a  photoconductor.  A  high 
vol^e  is  applied  across  a  transparent  conductor  in 
contact  with  the  photoconductor  on  one  side  and  a 
transparent  conductor  on  the  other.  To  use  this  type  of 
elect^ptic  device  as  a  backpropagation  neuron  the 
induced  birefringence  must  be  doubled  for  the  back- 


Fig.  4.  Input  output  ralationi  for  a  special  purpose  bidirectional 
optically  addressed  spatial  light  modulator  backpropagation  neu¬ 
ron:  PC  "  photoconductor;  EO  ■  electrooptic;  TC  ■  transparent 
conductor. 


ward  propagating  wave  to  obtain  a  saturating  forward 
nonlinearity  while  obtaining  a  derivative  backward 
multiplication.  This  can  be  accomplished  by  a  pair  of 
photoconductors,  both  addrused  by  the  same  forward 
propagating  beam,  where  one  is  us^  to  modulate  the 
forward-propagating  device  which  is  biased  with  a 
voltage  V„  while  the  other  is  used  to  modulate  an  EO 
device  with  a  saturation  voltage  2V,.  The  forward- 
propagating  modulator  is  used  to  modulate  a  fixed 
intensity  pump  /p  so  that  a  single  half-cycle  of  a  satu¬ 
rating  noi^earity  can  be  generated,  //*  *  /p  sin^ii^' V 
fiat)  for  <  /mi.  and  J?  otherwise.  The  bw^- 
ward-propagating  modulator  is  used  to  multiply  the 
backvmrd-propagating  error  signal  by  a  function 
•  /J*  sin*(2f^"V/iat)  for  <  Ima,  and  /J*  “*■  0  other¬ 
wise,  and  this  is  of  the  form  of  the  desir^  derivative 
multiplication.  Since  the  two  functions  required  of  a 
backpropagation  neuron  can  also  be  accomplished 
with  a  simpler  nonlinear  resonator  structure,  and  the 
response  time  of  these  nonlinear  etalons  can  be  ex¬ 
tremely  fast  compared  to  SLM  technology,  they  were 
chosen  for  study  in  the  architecture  presented  in  this 
paper. 

A.  Nonlinear  Fabry-Perot  Backpropagation  Neurons 

Nonlinear  Fabry-Perot  etalons^  are  a  promising 
candidate  for  implementing  the  neurons  in  an  optical 
learning  network  because  they  can  perform  nonlinear 
operations  on  arrays  of  coherent  beams,  which  allows 
the  outputs  to  be  used  to  record  and  modify  intercon¬ 
nection  holograms.  A  soft  thresholding  operation  can 
be  perform^  on  a  forward-propagating  beam  by  de¬ 
creasing  the  cavity  detuning  below  the  critical  detun¬ 
ing  ne^ed  for  bistabilify.^  These  optical  neurons 
cannot  easily  implement  the  idealized  derivative 
transmission  required  for  backpropagation,  but  a  simi¬ 
lar  peaked  response  can  be  obtafo^  by  operating  a 
nonlinear  etalon  in  the  probe  mode^^  for  the  backwa^- 
propagating  error  signaL  In  this  mode,  the  Fabry- 
Perot  resonance  is  scanned  by  the  nonlinear  depen¬ 
dence  of  the  index  on  the  intracavity  intensity,  which 
varies  in  response  to  the  high  power  forward  beam 
intensity.  The  weak  backward-propagating  probe 
beam  does  not  scan  the  cavity,  but  it  is  modulated  by 
the  current  state  of  the  cavity  transmission  function, 
which  is  the  appropriate  multiplication  type  of  re¬ 
sponse  needed  in  the  backward  direction,  '^e  probe 
mode  transmission  is  peaked  at  the  resonance  of  the 
Fabry-Perot,  which  occurs  when  the  sigmoid  response 
to  the  forward  beam  reaches  the  upper  level  The 
peak  maximum  is  not  exactly  at  the  region  of  the 
highest  slope  of  the  forward  beams  nonlinear  sigmoid 
response,  but  since  the  forward  and  backward  beams 
have  different  polarizations,  or  different  wavelengths, 
the  resonance  function  can  be  offset  to  achieve  a  prop¬ 
erly  positioned  probe  beam  resonance  peak. 

In  the  polarization  multiplexed  case  this  shift  can  be 
induced  by  including  a  thfo  birefringent  sheet  in  the 
cavity,^  or  perhaps  a  tunable  birefringence  can  be 
caus^  by  applying  a  static  external  field  to  the  cavity. 
This  type  of  birefringent  nonlinear  Fabry-Perot  etalon 
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Fig.  5.  Nonlinetf  Fabry-Perot  atalon  sigmoid  reaponsa,  ita  deriva¬ 
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an  auxiliary  intracavity  birefringence. 
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Fig.  6.  Dual-cavity  nonlinear  Fabry-Perot  etalon  with  forward- 
propagating  nonlinear  response  and  backward-propagating  scaimed 
resonance  probe  mode  transmission 


and  a  simvdation  of  the  forward-mode  sigmoid  transfer 
function  is  shown  in  Fig.  5  along  with  ita  derivative  and 
the  shifted  probe  mode  response  approximation  to  thia 
derivative.  Thia  device  can  implement  the  desired 
sigmoid  nonlinearity  of  the  high  intensity  forward- 
propagating  signals  with  a  differential  gain  greater 
than  one,  although  the  actual  gain  in  transmission  is 
less  than  one.  Ihe  probe  mode  response  is  not  sym¬ 
metric  about  the  peak  because  the  Airy  function  reso¬ 
nance  is  scanned  by  the  intracavity  intensity  which  is 
equal  to  the  transmitted  sigmoid  response  divided  by 
the  backmirror  transmittance.  This  asymmetry  con¬ 
tinues  to  allow  signals  that  are  above  threshold  to  build 
up  interconnection  gratings  in  the  previous  stage  cor¬ 
responding  to  correlated  inputs,  thereby  partially 
compensating  for  the  slow  forgetting  of  gratings  by  the 
volume  hologram.  However,  the  high  level  of  trans- 
miaaion  For  ^e  probe  beam  when  the  etalon  pump  is 
below  threshold  is  undesirable.  By  decreasing  the 
finesse  of  the  cavity  to  the  forward-propagating  beam  a 
trade-off  can  be  made  between  the  peak  width  and  off- 
resonance  tranamiaaion  of  the  probe  mode  response, 
with  the  switching  energy  for  the  forward-propagating 
nonlinear  device  characteristic. 

Another  possibility  would  be  to  use  two  closely 
spaced  cavities,  both  addressed  by  the  same  forward- 
and  backward-propagating  resolution  spots,  as  illus¬ 
trated  in  Fig.  6.  In  this  case  one  cavity  ia  optimized  to 
produce  a  sigmoid  response  of  the  forward-propagat¬ 


ing  beam  while  blocking  the  backward-propagating 
error  signal,  while  the  other  cavity  is  resonant  to  the 
backwanl-  propagating  beam.  The  Fabry-Perot  reso¬ 
nance  of  the  backpropagating  cavity  ia  linearly 
scanned  by  the  100%  reflected  forward-propagating 
incident  intensity,  thereby  producing  a  good  approxi¬ 
mation  to  the  desired  symmetric  derivative  response. 
We  expect  that  learning  and  eventual  convergence  can 
be  achieved  in  a  multUayer  optical  network  with  the 
forward  and  backward  response  that  can  be  obtained 
from  these  scaimed  resonance  devices,  even  though  the 
responses  do  not  precisely  match  the  nominal  respons¬ 
es  of  the  backpropagation  algorithm  because  of  the 
robustness  of  t^  gradient  descent  learning  procedure. 

B.  Description  of  a  Singie  Layer  of  the  Opticat 
Architecture 

A  single  layer  of  an  architecture  that  can  perform 
this  type  of  multilayer  perceptron  learning  procedure, 
using  polarization  multiplexing  of  the  forward-propa¬ 
gating  processing  beam  and  backward-propagating 
teaching  beam,  ia  shown  in  Fig.  1.  The  illustrated 
architecture  is  one  implementation  of  this  class  of 
backward-error-propagating  holographic  learning  ma¬ 
chines  that  serves  to  illustrate  the  principles  involved. 
Notice  that  no  lenses  are  shown  in  this  diagram  be¬ 
cause  the  volume  hologram  can  perform  the  desired 
weighted  interconnection  imaging  by  exposing  it  with 
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the  proper  expanding  image  and  focusing  reference 
beam  to  form  a  Fresnel  volume  hologram.  If  Fourier 
lenses  are  inserted  between  the  etalon  arrays  and  the 
volume  holographic  crystal,  the  exposed  hologrsm  will 
be  a  Fourier  hologram  with  planu  fringes,  and  the 
momentum  space  analysis  will  be  simplified,  but  the 
processor  learning  Snd  self-aligning  operations  will  be 
similar. 

The  forward-propagating  pattern  vector  transmit¬ 
ted  by  the  anisotropic  nonlinear  etalon  array  on  the 
left-hand  side  of  the  figure  is  polarized  at  a  —45"  angle 
and  is  rotated  clockwise  by  45"  as  it  passes  through  the 
nonredprocal  Faraday  rotator  so  that  it  becomes  hori¬ 
zontally  polarized.  'nils  aligns  the  forward-propagat¬ 
ing  be^  with  the  polarizer  allowing  it  to  pass  and 
illuminate  the  polarization  awitchii^  volume  holo¬ 
gram.  The  dif^cted  beam  consists  of  a  weighted 
interconnection  of  the  forward-propagating  pattern 
vector  by  the  current  state  of  the  holographicaily  rep¬ 
resented  interconnection  matrix,  stor^  as  a  superpo¬ 
sition  of  curved  and  chirping  apace  charge  gratings 
within  the  photorefractive  cryataL  The  di^act^ 
beam  is  polarization  switched  by  the  birefringent  dif¬ 
fraction  mechanism  to  an  orthogonal  polarization  to 
the  input,  and  this  vertical  polarization  state  is  rotated 
clockwise  by  45"  through  ^e  following  Faraday  rota¬ 
tor  so  that  it  falls  on  the  next  etalon  array  with  the 
same  —45"  polarization  as  the  forward-propagating 
transmitted  beam  from  the  previous  stage.  The  un- 
difiracted  beam  passes  straight  through  the  volume 
hologram  and  has  its  vertical  polarization  rotated  by 
45"  as  it  passes  through  the  Faraday  rotator,  so  that  it 
falls  on  tlie  phase  conjugate  mirror  (PCM)  with  a  po¬ 
larization  a^e  of  46",  the  same  as  tiie  counterpropa- 
gating  pump  beams  (which  are  not  shown),  producing 
an  identically  polarized  phase  conjugate  beam,  which 
is  composed  of  an  array  of  beams  that  retroreflect  back 
toward  the  etalon  sources  that  each  originated  from. 
This  phase  conjugate  beam  passes  through  the  nonre- 
cipro^  Faraday  rotator  picking  up  ano^er  45"  rota¬ 
tion  (instead  of  unwrapping  the  rotation  as  would  oc¬ 
cur  with  a  reciprocal  optical  activity  based  rotator), 
emerging  vertically  polarized  to  act  as  the  reference 
beam  array  for  the  self-aligning  holographic  outer 
product  exposure  with  the  backward-propagating  er¬ 
ror  signaL  The  backward-propagating  error  signal 
emerges  from  the  backside  of  the  output  etalon  array 
with  a  45"  polarization  that  is  orthogonal  to  the  for¬ 
ward-propagating  beam.  This  allows  for  the  polariza¬ 
tion  filtering  based  separation  of  the  reflected  for¬ 
ward-propagating  beam  from  the  transmitted  back¬ 
ward-propagating  beam  as  well  as  the  independent 
timing  of  the  relative  Fabry-Perot  resonance  position 
of  the  forward-  and  backward-propagating  beams. 
The  backward  propagating  error  signal  is  rotated  to  a 
vertical  polarization  by  the  Faraday  rotator  so  that  it 
interferes  in  the  volume  hologram  with  the  vertically 
polarized  phase  conjugate  reference  beam  and  not  with 
the  horizontally  polarized  undiffracted  forward-prop¬ 
agating  signal.  Ihe  interference  of  a  backward-prop¬ 
agating  error  signal  emerging  from  a  particular  etalon 


at  the  output  with  the  phase  conjugated  forward-prop¬ 
agating  beam  emerging  from  a  particular  etalon  from 
t^  input  produces  a  self-aligning  volume  Fresnel  ho¬ 
lographic  interference  pattern  that  interconnects 
thesj  two  etalons  for  both  forward-  and  backward- 
propagating  beams  with  the  exact  same  diffraction 
efficiency,  or  weight,  due  to  the  reciprocity  of  linear 
electromagnetic  systems.  The  interference  of  the 
backward-propagating  error  beam  with  the  phase  con¬ 
jugate  of  the  forward-propagating  beam  records  a 
Fresnel  grating  due  to  each  pair  of  beams  that  is 
present,  perturbing  the  weight^  interconnection  ma¬ 
trix  represented  by  the  hologram  by  the  outer  product 
of  the  signal  and  error  vectors  and  thereby  pushing  the 
matrix  toward  the  desired  interconnection  solution. 
The  backward-propagating  beam  is  polarization 
switched  by  the  volume  holographic  diffraction  mech¬ 
anism,  producing  a  horizontally  polarized  beam  which 
is  the  appropriate  weighted  summation  of  the  error 
signal  by  the  transpose  of  the  interconnection  matrix 
seen  by  the  forward-propagating  beam.  This  passes 
through  the  polarizer  and  is  Faraday  rotated  by  45"  to 
be  incident  on  the  etalon  array  with  a  45"  polarization 
angle,  orthogonal  to  the  forward-propagating  beam, 
and  the  same  as  the  backward-propagating  beam  which 
emerged  from  the  previous  output  kyer.  The  undif- 
fract^  phase  conjugate  of  the  forward-propagating 
beam  ne^  to  be  blocked  so  that  it  is  not  confrued  with 
the  copropagating  diffracted  backward-error-propagat¬ 
ing  signal,  and  this  is  accomplished  by  the  polai^r 
wUch  blocks  the  vertical  polarization  of  the  undesired 
phase  conjugate  reference  beam.  The  indicated  nonre¬ 
ciprocal  polarization  filtering  will  also  remove  the  un¬ 
wanted  reflections  from  the  nonlinear  etalons  and  un¬ 
wanted  diffraction  terms  produced  by  the  hologram. 
The  diffracted  phase  conjugate  reference  and  undif¬ 
fracted  backward-error  signal  emerge  at  a  different  an¬ 
gle  and  will  not  focus  on  the  etalon;  thus  they  can  bs 
ignored,  or  they  can  be  examined  to  determine  interme¬ 
diate  states  of  the  hidden  neurons.  Each  layer  is  com¬ 
pletely  compatible  with  the  previous  and  the  following 
layers  so  this  type  of  learning  network  can  be  stacked  up 
to  form  a  complex  multilayer  learning  machine. 

C.  Requirements  for  the  Holographic  Interconnection 

The  dyiuunic  holographic  interconnection  tech¬ 
nique  described  in  this  paper  is  based  on  the  photore¬ 
fractive  effect,  which  is  a  light-induced  index  of  refrac¬ 
tion  modulation  that  occurs  in  photoconductive 
electrooptic  crystals.  A  space  charge  grating  image  of 
an  interference  profile  is  created  by  carriers  ionized 
from  fixed  traps  into  the  conduction  band,  where  the 
mobile  carriers  redistribute  under  the  influence  of 
drift,  diffusion,  and  bulk  photovoltaic  effects,  until 
they  recombine  with  an  unoccupied  trap.  The  redis¬ 
tributed  optically  generated  carriers  pt^uce  a  space 
charge  grating  with  a  fundamental  Fourier  component 
that  may  be  phase  shifted  from  the  interference  pro¬ 
file.  The  spatial  variations  of  the  resulting  space 
charge  pattern  produce  a  corresponding  electric  field 
through  Poisson’s  equation.  T^  space  charge  field 
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induces  an  eiectiooptic  modulation  of  the  local  imper¬ 
meability  tensor  as  1<^  as  the  appropriate  electroop¬ 
tic  tensor  coefficient  is  nonzero.  In  turn,  this  couplM 
the  input  field  into  a  diffracted  output  fiehl  as  long  as 
the  appropriate  impermeability  tensor  coefficient  is 
nonzero.  The  implementation  presented  in  the  previ¬ 
ous  section  is  bas^  on  a  polarization  switching  diffiac- 
tion  mechanism  for  whi^  it  is  required  to  have  an  off- 
diagonal  impermeability  tensor  coefficient.  This 
requires  electrooptic  tensor  coefficients  in  the  bottom 
half  of  the  reduced  subscript  eleetroopotic  matrix, 
which  can  take  place  in  some  electrooptic  volume  holo¬ 
graphic  materi^,  such  as  BiiaSiOjo,  LiNbOa,  BaTiOa, 
andGaAs.  Self-aligning  recording  combined  with  po¬ 
larization  switching  diffraction  between  linear  eigen- 
modes  requires  an  optically  isotropic  medium  (or  one 
in  which  anisotropy  can  be  elim^ted  throu^  the 
application  of  a  static  field),  with  no  optical  activity, 
and  these  conditions  imply  that  a  material  of  symme¬ 
try  group  43m,  such  as  some  m-V  semiconductors 
(e.g.,  GaAs  or  I^),  should  be  used  as  ffie  photorefiac- 
tive  holographic  medium.  The  efficiency  of  the  dif¬ 
fraction  depends  on  the  effective  coupling  strength 
which  depends  on  the  angle  of  the  gratings,  polarim- 
tion  of  ^e  input  wave,  and  momentum  matching 
(Bragg)  condition  in  a  rather  complicated  fashion. 
However,  use  of  a  Fresnel  hologram  can  produce  an 
averaging  over  all  these  effects  for  all  the  interconnec¬ 
tion  holograms,  while  in  a  Fourier  hologram  with  pla¬ 
nar  fringes  ea^  grating  has  a  different  diffraction 
efficiency. 

The  polarization  switching  diffraction  efficiency 
and  the  holographic  storage  capacity  can  be  simulta¬ 
neously  maximized  by  having  the  input  and  output 
beams  propagating  at  large  a^es,  as  indicated  in  the 
figure.  The  unwanted  polarization  switching  grating 
exposures  due  to  the  simultaneous  presence  of  multi¬ 
ple  reference  (or  object)  beams  produce  crosstalk  of 
the  undiffracted  forward-propagating  beam,  which 
can  be  eliminated  with  the  indicated  polarization  fil¬ 
tering.  The  storage  capacity  of  the  volume  hologram 
will  enforce  limits  on  the  number  of  nonlinear  devices 
that  can  be  interconnected  and  on  their  topology  be¬ 
cause  of  the  cone  of  ambiguity  associated  with  Bra^ 
diffraction.^  A  sparse  array  of  etalons  will  have  to  be 
utilized  to  implement  a  fully  global  interconnection 
without  unwanted  crosstalk,  which  will  also  facilitate 
the  dissipation  of  heat  generated  in  the  nonlinear  eta¬ 
lons  and  thereby  allow  a  very  high  speed  of  operation. 
However,  the  learning  operation  must  occur  slowly  for 
the  backpropagation  algorithm  to  converge  properly, 
and  this  is  well  matched  with  most  pbotorefractive 
crystal  volume  holograms,  because  the  crystal  re¬ 
sponse  times  are  slow,  and  the  perturbation  of  an  exist¬ 
ing  space-charge  grating  by  a  single  outer  product  ex¬ 
posure  is  very  small. 

It  is  necessary  to  be  able  to  both  selectively  erase 
holographic  gratings,  thus  decreasing  the  connection 
strength  between  particular  etalons,  as  well  as  to 
strengthen  individual  gratings  thereby  increasing  the 
corresponding  elements  of  the  interconnection  matrix. 


Selective  erasure  can  be  accomplished  by  using  a  phase 
encoded  backward-propagating  error  signal,  wtoe  a 
phase  angle  of  0  is  used  to  represent  aU  positive  error 
signals,  and  a  phase  angle  of  r  is  used  to  represent  all 
n^ative  error  signals.  Fresnel  gratings  that  are  built 
up  with  a  phase  angle  of  0  can  have  the  corresponding 
interconnection  decreased  selectively  by  shiffing  the 
recording  interference  profile  by  r,  as  demonstra^  in 
Sec.  V,  and  by  Huignard  for  Fourier  holograms.^^  Al¬ 
ternatively,  selective  interconnection  erasure  might  be 
accompli^ed  by  strengthening  interconnecrion  grat¬ 
ings  when  the  applied  bias  field  is  in  one  direction, 
causing  the  resulting  space  charge  grating  to  shift  away 
from  the  optical  intensity  profile  in  the  direction  of  the 
E  field  by  approximately  ir/2,  while  decreasing  inter¬ 
connection  gratings  when  the  bias  field  is  reversed, 
producing  a  canceling  space  charger  grating  with  a 
phase  shifr  of  — t/2.  Another  approach  to  decreasing 
interconnection  strength  would  be  to  rely  on  the  simul¬ 
taneous  erasure  of  all  the  gratings  by  tJis  optical  read¬ 
out  and  thermal  effects,  thereby  inserting  a  forgetting 
term  in  the  dynamical  equation  for  the  holographically 
represented  interconnection  matrix.  This  approach 
requires  continuous  reinforcement  to  avoid  forgetting 
everything  that  has  been  learned.  Once  learning  has 
been  completed  a  mechanism  of  fixing  the  hologram 
could  be  used  to  make  the  interconnections  perma- 
nent.^ 

A  scheme  must  be  devised  to  implement  negative 
interconnection  stengths,  or  else  all  t^  signals  must  be 
placed  on  an  appropriate  bias.  An  attractive  possi¬ 
bility  for  the  implementation  of  bipolar  weights  is  to 
use  the  phase  shift  of  each  grating  to  represent  its  sign 
and  count  on  destructive  interference  within  each  non¬ 
linear  etalon  to  subtract  the  positively  and  negatively 
weighted  diffracted  components,  lliis  approach  is 
sensitive  to  the  phase  response  of  the  etalons,  so  it  is 
necessary  to  minimize  (or  to  compensate  for)  nonlinear 
phase  shifts  produced  by  the  etalons  and  to  avoid 
phase  sensitive  switching  behavior  in  the  etalons.^ 

IV.  Seff-ANgning  BUkectional  Vofcime  HotograpWc 
Interconnections 

The  preliminary  analysis  of  a  bidirectional  optical 
interconnection  system  begins  with  an  explanation  of 
the  recording  of  a  hologram  by  using  a  phase  conjugat¬ 
ed  reference  beam.  The  1-D  system  us^  in  this  analy¬ 
sis  is  presented  in  Fig.  7  and  consists  of  two  lines 
(planes)  of  optical  neurons  which  need  to  be  intercon¬ 
nected  by  a  volume  hologram.  The  phase  conjugate 
mirror  is  used  to  conjugate  the  expanding  waves  emit¬ 
ted  from  one  plane  of  neurons  and  retroreflect  them 
back  toward  the  sources  from  which  they  emerged,  and 
either  direction  can  be  the  one  chosen  to  be  conjugat¬ 
ed.  The  field  emitted  by  a  line  of  J  neurons,  separated 
by  D,  with  an  aperture  profile  /i(xoo'o),  and  propagat¬ 
ing  at  an  off-axis  angle  with  spati^  frequency  a  is  a 
linear  combination  of  off-axis  spherically  expanding 
waves  propagating  toward  the  right.  The  undiffract¬ 
ed  field  that  passes  straight  through  the  volume  holo¬ 
gram  (in  the  undepleted  pump  approximation)  and 
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strikes  the  phase  ooqjugate  mirtor  (PCM)  is  simple  in 
the  Fraunhofer  regime  of  the  individual  apertures, 
adiich  for  an  qMTture  profile  width  of  d  is  valid  for  z  » 
r(PA,(a»  1  mm  for  a  lO-zim  aperture)  and  is  alwi^ 
valid  for  Gaussian  apertures: 


-  ezp(~i2«'rt) 


«yi(Xo  -iDj'o)  Mp(i2i'aZo)  ezp(i(v/X(,)[(x,  -  x')*  +  (yo  -  y')*\\dx,^y^ 


>  axp(->i2xvt)  - 


iXz. 


•xp(i2*70a)  •xpit(vAx,)|(x'-  jD)*  +  y^)H  +  “•  • 


(U) 


b  this  expression  the  neural  activity  pattern  vector  ay' 
is  represented  as  a  spatially  multiplexed  array  of  eta- 
Ion  output  fields  which  will  be  us^  for  learning,  but 
the  intensities  |ay|^  are  a  more  likely  representation  of 
the  neuron  outputs  that  will  be  used  for  subsequent 
nonlinear  processing. 

Each  source  produces  an  off-axis  expanding  spheri¬ 
cal  wave  with  a  linear  phase  factor  given  by  the  source 
position.  The  Fourier  transform  of  a  source  aperture 
is  given  by  H{uju)  and  its  size  and  position  shift  with 
the  propagation  distance  z,.  The  distance  between 
the  input  plane  of  neurons  and  the  front  size  of  the 
volume  hologram  is  zq,  the  thickness  of  the  hologram  is 
L,  the  index  of  refraction  of  the  photorefractive  medi¬ 
um  is  no,  and  the  distance  from  the  hologram  to  the 


This  equation  represents  a  left  propagating  quadrati- 
cally  curved  superposition  of  waves  that  are  focusing 
toward  the  J  source  neurons.  The  profiles  of  these 
focusing  beams  are  given  by  the  transform  of  the  indi¬ 
vidual  source  apertures  /f(u  +  a',o)  that  are  scaled  and 
shifted  with  the  z  coordinate  within  the  holographic 
medium. 

Similarly,  the  backpropagating  error  field  emitted 
by  a  line  of  K  neurons  separated  by  D'  in  the  second 
layer  can  be  described  as  a  superimposition  of  spheri¬ 
cally  expanding  waves.  The  separation  between  the 
output  neurons  and  a  plane  z  within  the  hologram  is  z 
■  zi  +  noz,  which  is  a  reversed  coordinate  from  that 
used  for  the  forward-propagating  wave: 


■  esp(-<2ri>()  - 


•  expi-iixDt)  ■ 


mi 

1X2 

iXz 


11^  ***fc(*i  ”  w>'0'i)  exp(i2»ooe,)  exp)»(»/Xx)[(x,  -  x)*  +  (y,  -  y)*ll<<Xi(iyi 
^  bk  «xp(i2wkD'a)  •xp{((x/Ax)((x  -  W^)*  +  y*l|tf  +  o,  • 


(13) 


phase  conjugate  mirror  is  Zc;  thus  the  total  optical  path 
length  between  the  neurons  and  the  PCM  is  z*  *  zp  + 
noL  +  Zc*  This  wavefront  is  phase  conjugated  by  the 
PCM,  which  retroreflects  each  expand!^  spherical 
wavefront  back  toward  its  point  of  origin.  The  result¬ 
ing  field  within  the  holographic  crystal  is  dependent  on 
the  z  coordinate,  and  since  both  writing  waves  are 
incident  on  the  hologram  from  the  right,  z  is  defined  to 
be  zero  at  the  right  edge  of  the  crystal  and  increased  to 
L  at  the  left  edge.  The  phase  conjugated  reference 
wave  within  the  hologram  is  most  easily  expressed  in 
terms  of  the  optical  path  length  between  the  input 
neurons  and  a  given  plane  withLi  the  hologram,  z' «  zq 
+  (L  -  z)no. 

■  6xp(-i2ryt) -  ^  *j  e*p(~*2ir/i>a) 


X  exp|-i(ir/Xx')l(x  -  ;D)*  +  y^| 


Pig.  7.  Self-aligning  bidirectional  dynamic  volume  holographic  in¬ 
terconnection  ueing  a  phase  conjugated  reference 
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Fif.  8.  Diffncted  spot  produced  by  •  hish  diffraction  efficiency 
lensleie  Fraeiiet  volume  holosiam  rooorded  in  LiNbOs  and  an  expo¬ 
sure  which  shows  just  the  peak. 


This  backpropagatiiig  error  vector  b*  produces  a  wave- 
front  that  interferes  with  the  retroreflacted  phase  con¬ 
jugate  wavefronts  due  to  the  forward-propagating 
sources  in  the  first  layer  within  the  volume  hologram. 
This  records  a  self-aligning  interference  pattern  that 
modulates  the  index  of  refraction  with^  the  holo¬ 
graphic  medium.  The  modulation  term  can  be  ex- 
prMsed  as  a  curved  and  chirping  fringe  patt^  within 
the  overlap  region  of  the  diffiracting  wavefronts  in  the 
crystal  Tlie  repetitive  presentation  of  training  pat¬ 
terns  and  bipolar  error  patterns  to  the  front  and  back 
of  the  single  layer  being  described  will  result  in  the 
time  integration  of  successive  outer  product  connectiv¬ 
ity  patterns: 


X  2  w))  • 


(Ul 


This  expression  represents  a  superpoeition  of  JKJ  tarn- 
ilies  of  elliptical  fringes  within  ^e  volume  of  tiie  holo¬ 
gram  with  eadi  pair  of  sources  at  the  fod  of  a  set  of 
elliptical  shells,  and  Z(  *■  so  +  n<Xf  zi  is  the  totd 
distance  between  the  etalon  planes.  This  time-inte¬ 
grated  interference  pattern  will  be  transformed  into  a 
proportional  index  modulation  with  a  possible  phase 
shift  through  the  photorefractive  effect. 

For  the  chirped  and  curved  volume  Fresnel  phase 
holograms  being  analyzed  here  a  momentum  space 
analysis  is  inappropriate,  since  spatial  fiwquency  and 
fringe  tilt  are  spatially  varying,  resulting  in  a  poorly 
defined  perturlmtion  momentum  vector;  thus  an  «- 
plicit  integration  of  the  diffracted  field  produced  at 
each  2-plane  should  be  carried  out  instead.  After  the 
hologram  is  recorded,  it  is  reilluminated  by  a  weighted 
superposition  of  expanding  spherical  waves  which  are 
diffracted  by  all  the  index  modulations  that  are 
present  This  analysis  can  be  carried  out  for  either 
forward-  or  backward-propagating  waves  in  an  identi¬ 
cal  manner,  but  we  only  consider  illumination  with  a 
forward-propagating  wave  here.  When  the  volume 
hologram  is  illuminated  by  the  diffracted  wavefront 
from  a  new  input  neural  activity  pattern  a'/,  the  dif¬ 
fracted  field  at  each  plane  z  will  contain  a  matched 
term,  which  will  produce  focusing  wavefronts  propa¬ 
gating  toward  the  output  neurons  and  a  number  of 


Fig.  9.  Positional  sensitivity  of  the  Fresnel  volume  hologram:  (a) 
out  of  the  primary  interaction  plane;  (b)  in  the  interaction  plane. 
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unwanted  croMtalk  terms.  Combining  the  results  of 
Eqs.  (12)  and  (13)  we  can  derive  an  eqiression  for  the 
diffnicted  fidd  produced  at  each  plane  of  the  volume 
hdogram: 


.  e*p(-i2»»t)  — ^  «P«2»/D«)  6xp»i(*/X*')((*  -  /D)*  + 

-  «p(-i2,.<)  ^  ^  4  X  5  r  a;(to6;(*o«<t' 


—iXx 


■  ezp|i2Ta(0''  -  +  fciyii  •xrirtx/Xz'XC^  -  f)D^  -  2DxU  -  y)\\  «rri-i(T/X*)((*  -  ftDO*  +  y*l| 


(15) 


With  the  approximations  that  the  last  two  terms  ap-' 
proximately  overlap  so  that  the  product  is  equal  to  a 
constant,  and  the  paraxial  approximation,  the  diffract¬ 
ed  field  can  be  propagated  to  the  plane  of  output 
neurons  from  any  diffracting  plane  within  the  volume 
hologram.  This  is  assuming  the  undepleted  pump 
approximation,  which  is  a  reasonable  approximation 
for  diffiaction  efficiency  £10%.  Each  plane  of  infini¬ 
tesimal  thickness  produces  an  appropriate  focusing 
contribution,  with  the  appropriate  fo<^  length,  mag¬ 
nification,  and  phase  to  produce  a  focal  spot  wi^  pro¬ 
file  h(xi  -  kiy  -  0"  -  /)fJ(*/*')0'i)-  We  need  to  sum 
up  all  the  contribution  throughout  the  thickness  L  of 
the  hologram  to  obtain  a  Bragg  selection  condition, 
which  wUl  require  j  «  /,  so  that  focal  spots  are  only 
produced  at  each  of  the  K  output  neurons: 


In  this  equation  a  number  of  simplifications  have  been 
made,  but  the  neglected  terms  lead  to  an  increase 
in  the  Bragg  selectivity.  All  the  phase  factors  have 
been  lump^  into  the  term  exp({^).  The  integration 
over  z  produces  a  sine  function  of  (j  —  f),  which  is 
analogous  to  the  thick  hologram  Bragg  condition  for 
these  elliptical  fringes.  As  long  as  the  separations 
between  ^  input  neurons  D,  and  output  neurons  ly, 
are  lar^  enough  for  a  given  hologram  thickness  L  and 
recording  geometry,  zo^i,a,  we  can  assume  perfect 
Bragg  sele^vity,  and,  therefore,  j  «  j'.  This  would  be 
the  normal  Bragg  condition  if  Fourier  lenses  were  in¬ 
serted  in  the  processor,  and  its  results  in  a  positional 
selectivity  in  the  plane  of  the  lines  of  neurons  which 
effectively  eliminates  all  the  unwanted  shift-invariant 
crosstalk  terms  that  are  present  with  a  thin  holograph¬ 
ic  grating. 


f  f  1 

«(*iO'i.<)  ®  ^ ^  j  j  D(xcf^A)  expi(»-/Xr)((x,  -  *)*  +  (y,  -  y)*]}dxdy  Uz 

«  exp(-i2TKt)  ezp(-i2Taz,)  vxpiup^  S  S  2  f 

X  jf^  -kiy-{j-  j')D  p)  #xpJ-iaD0  -  n  pji* 

»  exp(-i2»»t)  exp(-i'2iraX})  •xp(»^')  V  o}.  ^  V  f  aj{f)bl{f)(if 

X  -  ,D.  -  U  - /ID 

\  Zo  +  nLj  (zo  +  f.n)*  J 

«  exp(-i2Tr/)  exp(-i2Tax,)  exp(i»  ^  a]  ^  |^|  ay((')6t((')o«'j  )i(z,  -kiyy). 


(16) 
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For  a  hudogtam  thicknau  rtoL  •  1  cm,  placed  1  cm 
off*axia  and  10  cm  fixmi  input  and  output  neural  pianra 
we  can  eeparate  the  etalone  by  D  «  100  itm,  (at  the 
fourth  xero  ci  the  tine),  allowing  100  to  be  pad^  per 
centimetM.  However,  for  a  90*  diffraction  angle,  ae 
illustrated  in  Fig.  6,  we  can  bring  the  etalons  to  within 
D  w  10  jun  of  each  other  allowing  a  Unear  packing 
density  of  1000  etalona/cm.  When  2-D  arrays  of  neu>^ 
tons  are  to  be  interconnected  additional  constraints' 
must  be  imposed  on  their  topology  to  achieve  die  ap¬ 
propriate  selectivity  of  the  diffracted  orders,  and  cmiy 
a  spare  selection  of  a  2-D  array  of  etalons  may  be 
utiUzed,  containing  between  10*  and  10*  etelons/in.^.^ 
An  appropriate  topology  fat  the  utilized  etalons  can  be 
deriv^  Ity  considering  the  interconnection  to  be  space 
variant  in  the  interaction  plane  and  space  invariant  in 
the  orthogonal  dimension.  In  the  case  of  a  Fresnel 
volume  hologram  there  is  some  space-variant  widening 
of  the  impulse  response  in  the  direction  orthogonal  to 
the  interaction  plane,  which  is  not  present  with  Fouri¬ 
er  holograms. 

The  resulting  field  incident  on  the  spatially  multi¬ 
plexed  outyut  neiuons  is  found  to  be  proportional  to 
the  desir^  matrix  vector  product  of  the  input  activity 
pattern  a)  with  the  time-integrated  outer  product  of  t^ 
sequence  of  forward  and  backward  waves.  The  inten¬ 
sity  at  tile  output  neural  plane  is  given  by  the  modulus 
squared  of  the  field,  and  this  intensity  will  be  detected 
by  the  neurons  and  used  for  subsequent  nonlinear 
processing. 

This  is  a  fully  self-aligning  dynamic  volume  holo¬ 
graphic  global  interconnection  scheme  which  works 
redproc^y  for  forward  and  backward  waves  as  re- 
quii^  by  the  backpropagation  algorithm.  This  inter¬ 
connection  technique  requires  no  lenses,  because  the 
Fresnel  hologram  accomplishes  the  imaging  operation. 

V.  Experimental  ImweMgallen  of  Reanei  Volume 
Hologram  bilercoiinectloiis 

Volume  Fresnel  holograms  were  recorded  in  photor- 
efractive  crystals  as  the  interference  pattern  between 
expanding  and  collapsing  spherical  waves  to  test  their 
capabilities  as  lensless  interconnection  elements. 
When  this  volume  hologram  was  reilluminated  by  one 
or  more  of  an  array  of  e^mnding  reference  beams,  the 
collapsing  spheri^  object  waves  were  reproduced, 
which  focused  to  an  array  of  small  spots  at  the  output 
plane.  First,  a  single  input  point  was  interconnect 
to  a  single  output  point  using  an  expanding  wave  inter¬ 
fered  with  a  collapsing  wave  in  the  volume  of  a  LiNbOs 
crystal,  and  the  interference  pattern  was  time  int^at- 
ed  for  several  minutes  to  build  up  a  reasonably  high 
difiiraction  efficiency  grating.  A  magnified  image  of 
the  diffracted  focal  spot  which  was  produced  at  the 
output  plane  when  the  crystal  was  reilluminated  by  the 
expanding  reference  beam  is  shown  in  Fig.  8.  A  good 
focused  spot  is  produced,  but  when  the  fihn  is  overex¬ 
posed  a  large  amount  of  sidelobe  structure  becomes 
apparent  A  large  amount  of  fanning  of  the  diffracted 
light  is  produced  in  the  plane  of  the  crystal  c  axis  due  to 
the  recording  of  additional  gratings  in  the  crystal  be 


tween  the  reference  beam  and  scattered  object  beam. 
A  vertical  line  appears  at  the  output  plane  which  is  due 
to  the  Bragg  matched  difiliraction  of  the  reference  beam 
by  the  gratings  formed  between  the  scattered  reference 
with  the  object  beam,  and  this  line  is  actually  a  small 
part  of  a  large  circle  of  confusion  which  passes  through 
the  reference  source  and  the  object  virtual  source. 
These  fanning  components  built  up  over  a  longer  time 
scale  than  the  desired  focusing  diffracted  light  and 
were  not  visibto  with  short  hologn^thic  exposures. 
The  weak  additional  spot  is  a  multiple  reflection  arti¬ 
fact  A  measurement  of  the  Bragg  positional  sensitiv¬ 
ity  in  and  out  of  the  principal  interaction  plane  is 
shown  in  Fig.  9.  In  the  plane  a  good  approximation  to 
a  sine  function  with  21-ftm  wid^  is  obtained,  which  is 
near  the  expected  width  for  this  experimental  geome¬ 
try. 

The  measurement  was  obtained  by  translating  the 
Fresnel  volume  hologram  and  measuring  the  resulting 
diffraction  efficiency.  However,  when  the  hologram 
was  rotated  in  the  plane,  and  any  residual  translation 
was  compensated,  a  diffraction  efficiency  was  mea¬ 
sured  that  was  essentially  independent  of  angle,  as 
expected  for  these  angularly  diverse  volume  holo¬ 
grams.  (Xtt  of  the  interaction  plane  the  Fresnel  holo¬ 
gram  diffraction  efficiency  was  quite  insensitive  to  the 
hologram  position.  However,  the  position  of  the  dif¬ 
fract^  focused  spot  translated  across  the  detector 
array  as  the  hologram  was  moved,  inriiftAting  that  the 
holographic  interconnection  was  space  invariant  in 
this  dimension.  A  Fresnel  hologram  that  is  thick  in 
relation  to  the  separation  between  planes  produces  a 
vertically  widening  impulse  response  as  the  out-of¬ 
plane  offset  is  increased  due  to  different  ofbet  magni¬ 
fications  at  different  hologram  depths.  This  feature 
needs  to  be  considered  w^n  selecting^  a  2-D  neuron 
array  topology  for  use  with  the  Freand  hologram  inter¬ 
connection  scheme. 

An  optical  neural  network  interconnection  pattern 
requires  many  point  sources  to  be  imaged  to  many 
ot^r  virtual  sources,  and  the  Fresnel  hologram  was 
tested  in  this  application  by  using  lenslet  arrays  for  the 
optical  sources.  A  line  array  of  real  sources  produced 
by  a  1-D  lenslet  array  was  interconnected  to  a  2-D 
array  of  virtual  sources  that  was  produced  by  imeging 
the  focal  plane  of  a  lenslet  array  through  and  beyond 
the  volume  hologram.  In  this  interconnection 

experiment  approximately  fifty  sources  were  intercon¬ 
nected  with  a  50  X  50  array  of  output  focal  spots, 
thereby  implementing  more  than  10*  holographic  in¬ 
terconnection  lenses.  A  small  portion  of  the  diffracted 
output  plane  produced  by  this  hologram  when  it  was 
illuminated  by  the  object  wave  is  shown  in  Fig.  10. 
This  looks  almost  identical  to  the  image  of  the  lenslet 
array  produced  by  the  object  beam,  and  no  fanning 
artifacts  like  those  shown  in  Fig.  8  are  visible.  This  is 
because  the  diffraction  efficiency  of  each  interconnec¬ 
tion  hologram  is  extremely  small  in  this  case,  and  the 
weak  fanning  artifacts  produced  by  different  sources 
do  not  add  up  constructively. 

Adaptive  holographic  interconnection  networks 
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Diflnctad  output  produced  by  an  ^  lanileae  holo¬ 
graphic  intaroonneeton. 


must  be  able  to  represent  bipolar  weights  and  to  de¬ 
crease  intercoimection  strengths,  so  selective  erasure 
was  examined  as  one  possible  technique  that  can  be 
used  for  both  these  purposes.  To  show  that  Fresnel 
gratings  could  be  selectively  erased,  Bii2Si02o  used 

as  the  holographic  recording  medium  so  that  photovol¬ 
taic  effects  c(^d  be  elimiuted,  and  faster  response 
times  could  be  obtained.  A  piezoelectric  mirror  was 
used  to  phase  modulate  an  object  beam  with  a  r  phase 
shift  increment.  Interconnection  gratings  were  built 
up  with  one  phase,  then  the  object  phase  was  shifted, 
and  the  diffraction  efficiency  of  the  reference  beam 
into  the  object  focal  spot  was  measured  as  the  holo¬ 
gram  was  erased  and  rewritten  with  a  r  phase  shift 
Because  different  wavelength  probe  beams  cannot  be 
used  to  measure  the  diffraction  efficiency  of  Fresnel 
holograms,  the  object  beam  and  diffracted  beam  were 
alternatively  chopped  in  a  nonoverlapping  fashion  to 
measure  the  diffraction  efficiency  seen  by  the  r^er- 
ence  beam  as  a  function  of  time.  An  example  of  this 
type  of  selective  erasure  process  is  shown  in  Fig.  11(b), 
and  it  is  to  be  compared  with  the  incoherent  erasure 
that  was  obtained  by  blocking  the  reference  beam  as 
shown  in  Fig.  11(a).  The  selective  erasure  was  much 
faster  than  ^  incoherent  erasure  process,  or  the  writ¬ 
ing  process  after  the  previous  grating  was  erased,  be¬ 
cause  the  incoherent  erasure  and  phase  shifted  writing 
are  cooperating  processes  during  selective  erasure, 
while  they  are  competing  processes  when  writing  the 
hologram.  The  phm  could  be  repetitively  shifted  by 
X  as  shown  in  Fig.  11(c),  and  a  succession  of  out-of- 
phase  gratings  can  be  written  and  erased.  Other  grat¬ 
ings  within  the  crystal  were  not  erased  any  faster  with 
this  phase  shifted  reference  approach  thu  they  were 
nors^y  by  incoherent  erasure,  which  demonstrates 
that  selective  erasure  of  the  Fresnel  hologram  is  occur¬ 
ring  throughout  the  volume  of  the  cryst^ 
Polarization  switching  diffraction  can  be  demon¬ 
strated  in  Bii2SiOao  by  writing  a  grating  in  the  110 
direction.”  ihe  propagating  eigenmodes  are  circular 
without  an  appli^  field,  because  of  optical  activity, 
and  the  right  mode  can  be  coupled  to  the  left  mode 


(c) 

Fig.  11.  Erasure  proceeses  in  BiuStOa):  (a)  incoherent  erasure 
process;  (b)  selective  erasure  process  using  a  r  phase  shifted  refer¬ 
ence  and  the  phase  shift  ugnal;  (c)  repetitive  r  phase  shift  writing 
and  erasure  (1  s/div). 


through  the  off  diagonal  tensor  components  of  a  pho- 
torefractively  indu^  pertubation  grating.  Itight  dr- 
cularly  polarized  expanding  spheri^  reference  waves 
were  interfered  with  right  circularly  polarized  focusing 
object  beams  to  record  polarization  switching  Fresnel 
volume  holograms  in  a  properly  rotated  Bii2Si02o  crys- 
taL  The  diffracted  field  focused  to  the  object  beam 
focal  spot,  and  the  polarization  state  was  analyzed  with 
a  properly  rotated  quarterwave  plate  and  polarizer, 
and  it  was  found  to  be  very  nearly  orthogonal  to  the 
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polarization  of  the  input  object  beam.  The  ortbo^ 
nality  of  these  modes  can  be  improved  by  carehil  align  - 
ment  of  the  crystal  axes  with  respect  to  the  principal 
interaction  phme,  but  perfect  or^ogonality  is  proba¬ 
bly  impossible  with  a  Freaiwl  hologram  because  of  the 
diversity  of  the  birefringent  gratings.  When 
99%  of  the  object  beam  can  be  filtered  with  this  polar¬ 
ization  filtering  scheme,  a  5%  diffraction  efficiency 
hologram  results  in  a  20%  feedthrough  of  the  undif- 
frac^  light  measured  with  respect  to  the  diffiacted 
polarization  switched  light.  This  suppression  ratio  of 
the  undiffracted  beam  must  be  impro^  if  the  polar¬ 
ization  multiplexed  architecture  is  to  be  used  for  a 
baclq>ropagation  network,  so  that  the  undiffiacted 
phase  conjugate  reference  will  not  corrupt  the  diffiact¬ 
ed  backpropagating  error. 

VL  System  Raciuirainants 

A  complete  two-layer  system,  illustrated  in  Fig.  12, 
requires  a  high  spe^  method  of  entering  data  for 
pattern  transformation  processing  and  another  means 
of  introducing  the  backward-propagating  error  signals 
for  the  learning  phase.  Probably  the  bert  approach  to 
high  speed  data  entry  at  the  back  end  of  the  system  is  a 
sparse  parallel  laser  diode  array  or  a  fiber-optic  input 
array,  demagnified  onto  the  first  layer  nonlinear  etalon 
array  wUch  is  operated  in  the  bistable  regime.  Inthis 
manner  the  subthreshold  coherent  bias  beams  trans¬ 
mitted  by  each  addressed  device  can  be  modulated  by 
the  data  signals,  thereby  using  the  input  nonlinear 
Fabry-Perot  etalon  array  as  a  hij^  speed  incoherent  to 
coherent  converter  with  memory.  At  the  final  layer  of 
the  system  error  signals  need  to  be  computed  and 
injected  back  into  the  system  with  the  appropriate 
polarization  or  wavelengfii  and  the  phase  shffi  or  tim¬ 
ing  needed  to  represent  the  sign  at  the  error.  The 
system  can  be  designed  with  either  optical  or  electronic 
error  detection  and  generation  circuitry  at  the  output 
to  introduce  the  backpropagating  error.  Optical  sub¬ 
traction  techniques  can  be  considered  for  an  optical 
approach  to  teaching  the  system.  Image  subtraction 
using  a  phase  conjugated  Michelson  interferometer^ 
appears  to  be  a  promising  approach  for  this  application 
since  it  produces  subtract^  fields  with  the  appropri¬ 
ate  phase  shift  to  represent  the  sign  of  the  error,  with¬ 
out  the  accurate  phw  adjustments  required  by  other 


interferometric  approaches  to  image  subtraction.  Al¬ 
ternatively,  since  the  computatioi^  load  required  at 
the  output  is  relatively  minor,  optical  detectors  can  be 
combined  with  electronic  subtraction  from  the  target 
vector  to  generate  the  bipolar  error  vector,  which  can 
be  applied  to  a  spatial  light  modulator  at  the  output  to 
introduce  the  backpropagating  error.  When  the  num¬ 
ber  of  outputs  of  the  pattern  transformation  procedure 
is  <1000,  they  can  be  arrayed  in  a  linear  format  which 
allows  the  utilization  of  high  speed  linear  detector 
arrays  for  output,  and  the  utilization  of  linear  spatial 
light  modulators,  to  introduce  the  backward-propa¬ 
gating  error  signals. 

The  fan-out  capability  of  each  layer  is  determined 
by  the  gain  of  the  nonlinear  devices,  the  holographic 
diffiaction  efficiency,  and  the  polarization  component 
throughput,  and  it  dictate  an  information  collaps¬ 
ing  network  architecture.  For  example,  if  the  product 
of  optical  efficiencies  is  only  3%,  a  network  with  30,000- 
bit  input  pattern  vectors  might  be  processed  by  1000 
hidden  units  that  communicate  with  thirty  output  de¬ 
vices,  which  simplifies  the  error  generation  process  at 
the  output.  The  ability  of  the  system  to  process  large 
amounts  of  data  in  parallel  at  a  very  Ugh  speed  is 
limited  by  the  electronic  addressing  of  the  input  array, 
and  the  output  photodetector  array  readout  time,  and 
not  by  the  intervening  optical  system,  because  of  the 
extremely  fast  response  achievable  with  nonlinear  eta- 
lons  and  the  almost  instantaneous  optical  interconnec¬ 
tion  delay.  The  optical  power  requirements  of  the 
system  are  primarily  dictated  by  the  first  layer  of  non- 
lUear  etalons,  since  there  are  many  more  in  this  layer 
than  in  the  succeeding  layers  for  a  collapsing  network. 
The  first-layer  etalons  are  not  bidirectional  and  can  be 
optimized  to  have  a  low  switching  energy.  Bistable 
nonlinear  etalons  have  been  operated  with  a  3-pJ 
switching  energy  at  a  rate  of  >'-'100  MHz,^  which  leads 
to  a  power  requirement  of  0.4-mW/etalon  or  12  W  for 
30,000  input  etalons.  Only  a  portion  of  tiiis  power  is 
dissipated  within  the  nonlinear  etalons,  and  a  heat 
dissipation  requirement  of  only  a  few  watts  per  cm^ 
should  be  achievable  with  forc^  air  or  liquid  cooiii^ 
techniques.  Most  of  this  power  is  suppli^  by  a  high 
power  coherent  pump  beam  that  is  us^  to  bias  each 
bistable  device  just  below  the  bistable  loop,  and  ~10%, 
or  40  mW,  is  required  per  laser  or  fiber-optic  input  to 


Fig.  12.  Complete  system  for  two-layer  backpropagation  optical  teaming  including  massively  parallel  input  laser  array  and  electronic  error 
detection  at  the  output.  LO  •  laser  diode  (or  fiber  optics),  NLFP  ■  nonlinear  Fabry-Perot,  PCM  «  phase  conjugate  mirror,  BEP  SLM  •  spa¬ 
tial  light  modulator  for  backward-propagating  error. 
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the  network.  The  backpropagating  neurons  will  re¬ 
quire  more  opticsal  power  because  the  dual  functions  of 
die  biduectional  cavities  conflict  with  the  require¬ 
ments  for  a  low  power  device.  Since  there  are  not  as 
many  hidden  and  output  devices  as  input  devices  the 
systems  power  requirements  are  primarily  dictated  by 
the  site  ^  the  input  array. 

The  forward-propagating  signal  can  be  a  narrow 
pulse  since  the  response  of  GaAs  nonlinear  Fabry- 
Perot  etalon  is  determined  by  the  peak  power  incident. 
In  this  case  the  backward-propa^ting  error  signal  can 
be  either  pulsed  or  cw.  In  the  pulsed  mode  the  PCM 
would  need  to  have  practically  instantaneous  re¬ 
sponse,  such  as  a  nonlinear  optical  semiconductor 
might  provide,  and  the  forward-  and  backward-propa¬ 
gating  pulses  could  be  time  jittered  so  they  do  not 
overlap  in  the  volume  hologram,  but  the  phase  conju¬ 
gate  reference  and  the  backward-propagating  error 
pulse  would  overlap  within  the  crystal,  thereby  expos¬ 
ing  a  hologram.  Alternatively,  the  backward-propagat¬ 
ing  error  signal  could  be  a  low  power  cw  Iwam  that 
would  not  have  a  high  enough  peak  power  to  nonlinear- 
ly  modify  the  index  within  the  Fabry-Perot  etalons, 
and  the  forward-propagating  pulse  could  be  turned 
into  a  quasi  cw  phase  conjugate  reference  by  using  a 
pbotorefractive  crystal-bas^  PCM  which  has  a  alow 
integrated  response.  The  Fabry-Perot  etalons  would 
need  to  have  a  slow  relaxation  time  of  the  nonlinearly 
shifted  index  (this  requires  long  carrier  life  times  and 
should  lead  to  lower  etalon  switching  energies);  thus 
the  probe  beam  would  have  the  appropriate  response 
for  most  of  the  interval  between  pulsea  of  the  forward 
beams.  In  this  case  the  holographic  exposure  would 
be  due  to  the  time  integral  of  the  cw  waves  in  the 
volume  hologram,  and  the  orthogonally  polarized  and 
pulsed  forward-propagating  beam  would  not  contrib¬ 
ute  significantly  to  the  hologram  exposure. 

VM.  Conclusion 

The  3-D  storage  capacity  of  volume  holograms  al¬ 
lows  the  construction  of  huge  globally  interconnected 
multilayer  optical  networks  which  are  well  beyond  the 
projected  capabilities  of  alternative  technologies. 
The  optical  system  seems  well  matched  to  the  bidirec¬ 
tional  requirements  of  a  backpropagation  learning  sys¬ 
tem  because  of  the  intrinsically  reciprocal  nature  of 
optical  interconnections.  Elrror  driven  learning  oper¬ 
ations,  such  as  backpropagation,  should  be  able  to 
compensate  for  many  of  ^e  technological  flaws  inher¬ 
ent  to  an  optical  implementation  by  adaptively  sensing 
the  misbehavior  of  the  system  and  driving  it  in  the 
appropriate  direction  necessary  to  overcome  its  imper¬ 
fections.  The  nonideal  optic^  implementation  of  a 
backpropagation  network  may  actually  have  improved 
performance  over  that  of  an  idealized  digital  simula¬ 
tion  because  noise  will  always  be  present  in  the  system, 
helping  it  to  avoid  shallow  local  minima,  and  pushing 
the  interconnection  matrix  away  from  solution  bound¬ 
aries.  Imperfections  of  the  holographic  interconnec¬ 
tion  will  help  the  system  perform  symmetry  breaking, 
which  the  idealized  model  cannot  perform  spontane¬ 


ously.  Thu  aimultansfMM  salf-aligning  and  l—rning  nf 
the  optical  system  make  this  approach  to  multilayer 
optical  neural  processing  experimentally  feasible  and 
allow  the  implementation  of  complicated  systems  that 
could  not  be  completely  specified  a  priori  but  can  be 
learned  and  modified  as  t^  desired  processing  opera¬ 
tion  slowly  changes.  The  slow  leaniing  of  the  ludo- 
graphic  crystals  combined  with  liie  extremely  high 
spe^  processing  of  the  nonlinear  etalons  gives  t^ 
system  an  enormous  throughput  potential  and  the  ca¬ 
pability  for  solving  complicated  cognitive  problems. 
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Adaptive  optical  networks  using  photorefractive  crystals 


Dametri  Psaltis,  David  Brady,  and  Kelvin  Wagner 


*n>a  capabilitiaa  of  photaiofeactivacryBtalaaa  media  for  holographic  iaterconnoctioM  in  neural  networka  arc 
aiamined.  Limitadoiia  on  tha  denaity  of  interconnectiona  and  the  number  of  holographic  aaaodationa  which 
can  be  atored  in  photorefractive  cryatala  are  derived.  Optical  axchitecturea  for  implementing  varioua  neural 
ichemea  are  dea^bed.  Experimental  reaulta  are  praaentad  for  one  of  theae  architecturea. 


u  Rwroauciioii 

T .earning  u  the  most  distinctive  feature  of  a  neural 
computer  and  in  many  respects  it  is  this  aspect  that 
gives  neural  computation  an  advantage  over  alterna¬ 
tive  computational  strategies.  A  neural  computer  is 
trained  to  produce  the  appropriate  response  to  a  class 
of  inputs  by  being  presented  with  a  sufflcient  number 
of  examples  during  the  learning  phase.  The  presenta¬ 
tion  of  ^ese  examples  causes  ^e  strength  of  the  con¬ 
nections  between  neurons  that  comprise  the  network 
to  be  modified  according  to  the  specifics  of  the  learning 
algorithm.  A  successful  learning  procedure  wiU  result' 
in  a  trained  network  that  responds  correctly  when  it  is 
'  presented  with  the  examples  it  has  seen  previously  and 
also  other  inputs  that  are  in  some  sense  similar  to  tihe ' 
known  patterns.  When  we  consider  a  physical  realiza¬ 
tion  of  a  neural  network  model,  we  have  two  options  in 
incorporating  learning  capability.  The  first  is  to  build 
a  network  with  fixed  but  initially  programmable  con¬ 
nections.  An  auxiliary,  conventioned  computer  can 
then  be  used  to  learn  the  correct  values  of  the  connec¬ 
tion  strengths  and  once  learning  has  been  completed 
the  network  can  be  programmed  by  the  computer. 
While  this  approach  may  be  reasonable  for  some  appli¬ 
cations,  a  system  with  continuously  modifiable  con¬ 
nections  presents  a  much  more  powerful  alternative. 

,  In  this  paper  we  consider  the  optical  implementa-' 
ition  of  learning  networks  using  volume  holographic 
interconnections  in  photorefractive  crystals.  The  use 
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of  volume  holograms  permits  the  storage  of  a  very  large 
number  of  intercoimections  per  unit  volume,  where¬ 
as  the  use  of  photorefractive  crystals  permits  the  dy¬ 
namic  modification  of  these  connections,  thus  allowing 
the  implementation  of  learning  algorithms.^^  We 
first  briefly  review  the  major  types  of  learning  algo¬ 
rithms  that  are  being  used  in  neural  network  models. 
We  then  estimate  the  maximum  number  of  holograph¬ 
ic  gratings  that  can  simultaneously  exist  in  a  photore¬ 
fractive  crystal.  Since  in  an  optical  implementation 
each  gratiiig  corresponds  to  a  separate  interconnection 
between  two  neurons,  this  estimate  gives  us  the  density 
of  connections  that  are  achievable  with  volume  holo¬ 
grams.  The  next  topic  that  we  address  is  how  the 
modulation  depth  of  each  grating  (or  equivalently  the 
strength  of  each  connection)  can  be  controlled  through 
the  implementation  of  learning  algorithms.  Two  re¬ 
lated  issues  are  investigated:  the  optical  architectures 
which  implement  different  learning  algorithms  and 
the  reconciliation  of  physical  mechanisms  that  are  in¬ 
volved  in  the  recording  of  holograms  in  photorefractive 
crystals  with  the  dynamics  of  the  learning  procedures 
in  neural  networks. 

lApfiwig  Aigovuniiw 

F or  the  purposes  of  this  discussion  it  is  convenient  to 
separate  the  wide  range  of  learning  algorithms  that 
have  been  discussed  in  the  literature  into  three  catego-  r 
ries:  prescribed  learning,  error  driven  learning,  and ' 
self-organization.  We  will  draw  the  distinction  among 
these  with  the  aid  of  Fig.  1,  where  a  general  network  is 
drawn  with  the  vector  x(k)  as  its  input  and  y(k)  the 
output  at  the  kth  iteration  (or  time  interval).  The 
vector  z{k)  is  used  to  represent  the  activity  of  the 
internal  units  and  Wij(k)  is  the  connection  strength 
between  the  tth  and  the  /th  units.  Let  x^"*’,  m  » 

1 ...  Af,  be  a  set  of  specified  input  vectors  and  let  y*”*) 
be  the  responses  which  the  network  must  produce  for 
each  of  these  input  vectors. 


Fig.  1.  General  neural  network  architecture. 


A  prescribed  learning  algorithm  calculates  the 
stren^  of  each  weight  simply  as  a  function  of  the 
vectors  and 

»(/  ■  m  -  l . . .  A#.  (1) 

This  type  of  procedure  is  relatively  simple  (easy  learn¬ 
ing).  It  is  perhaps  the  most  sensible  approach  in  a 
single  layer  network.  The  widely  used  outer  product 
algorithm*^^^  is  an  example  of  this  type  of  learning 
algorithm,  as  are  some  schemes  which  utilize  the  pseu- 
doinverse.***'*^  Despite  its  simplicity,  prescribed 
learning  is  limited  in  several  important  respects. 
First,  while  prescribed  learning  is  well  understood  for 
single  layer  systems,  the  existing  algorithms  for  two 
layers  are  largely  locedized  representations;  each  input 
activates  a  single  internal  neuron.^^^*  Moreover, 
the  entire  learning  procedure  usually  has  to  be  com-' 
pleted  a  priori.  This  last  limitation  is  not  encountered 
in  the  simplest  form  of  prescribed  learning,  the  outer 
product  rule: 

y 

U,..  m  2  (2) 

«•! 

In  this  case  new  memories  may  be  programmed  by 
simply  adding  the  outer  products  of  new  samples  to  the 
weight  matrix.  Note  that  once  the  interconnection 
matrix  has  been  determined  by  a  prescribed  learning 
algorithm,  it  may  be  expressed  in  form  of  a  sum  of 
at  most  N  outer  products,  where  JV  is  the  total  number 
of  neurons  in  each  layer.  Since  volume  holograms 
record  interconnection  matrices  represented  by  sums 
of  outer  products  in  a  very  natural  way,  matrices  which 
can  be  expressed  in  this  form  are  particularly  simple  to 
implement  in  optics.^'^'^ 

Error  driven  learning  is  distinguished  by  the  fact 
that  the  output  of  the  system,  y(Jk),  is  monitored  and 
compared  to  the  desired  response  y^'"^  An  incremen¬ 
tal  change  is  then  made  to  the  interconnection  weights 
to  reduce  the  error: 

The  change  Au/j;  is  calculated  from  the  vectors  and 
y<n>)  and  the  current  setting  of  the  weight  matrix  Wn(h) 
(from  which  the  state  of  the  entire  network  can  be 
calculated).  The  perceptron^^  and  adaline^^  algo¬ 
rithms  are  examples  of  error  driven  learning  for  single' 
layer  networks.  Interest  in  such  learning  algorithm 
has  been  renewed  recently  by  the  development  of  pro¬ 
cedures  suitable  for  multilayered  networks.^^  Error 
driven  algorithms  (hard  learning)  are  more  difficult  to 
implement  than  prescribed  learning  since  they  require 
a  l^e  number  of  iterations  before  errors  can  be  re¬ 
duced  to  sufficiently  low  levels.  In  multilayered  sys¬ 


tems,  however,  this  type  of  learning  can  provide  ai 
effective  mechanism  for  matching  the  available  re 
sources  (connections  and  neurons)  to  the  requirement 
of  the  problem.  In  optical  realizations  error  drivei 
algorithms  are  more  difficult  to  implement  than  pre 
scribed  approaches  due  to  the  need  for  dynamical 
modifiable  intercoimections  and  the  incorporation  o 
an  optical  system  that  monitors  the  performance  ani 
causes  the  necessary  changes  in  the  weights.^  Whil 
this  problem  could  be  avoided  by  performing  learnin 
off  line  in  computer  simulations  and  recording  th 
optimized  interconnection  matrix  as  in  prescribe 
learning,  this  approach  has  the  disadvantage  that  one 
again  the  matrix  is  fixed  a  priori,  thus  preventing  tb 
network  from  being  adaptive.  In  subs^uent  section 
we  will  consider  a  relatively  simple  form  of  Eq.  (3)  i 
which  Awij(k)  depends  only  on  locally  avetilable  infoi 
mation,  i.e.,  z,-  in  one  layer  and  zj  in  an  adjacent  layei 

The  perceptron  and  the  backward  error  propagatie 
algorithms  both  fall  in  this  subcategory  if  we  allow  tl 
neuronal  activity  Zj  to  include  error  signals,  i.e.,  if  eai 
neuron  has  distinct  signal  and  error  outputs  wUch  a 
separated  temporally  or  spatially.  An  example  of  su( 
a  neuron  implement^  in  optics  is  given  below  in  coi 
junction  with  an  optical  back  error  propagation  sy 
tem. 

In  the  case  of  self-organizing  learning  algorithms  \ 
require  not  that  the  specified  inputs  pr^uce  a  partic 
lar  response  but  rather  that  they  satisfy  a  gener 
restriction,  often  imposed  by  the  structure  of  the  m 
work  itself.  Since  there  is  no  a  priori  expected  r 
sponse,  the  learning  rule  for  self-organizing  systems 
simply 

This  type  of  learning  procedure  can  be  useful,  1 
example,  at  intermediate  levels  of  a  network  where  t 
purpose  is  not  to  elicit  an  external  response  but  rath 
to  generate  appropriate  internal  representations  of  t 
information  t^t  is  presented  as  input  to  the  netwoi 
There  is  a  broad  range  of  self-organizing  algorithn 
the  simplest  of  which  is  probably  lateral  inhibition 
enforce  grandmother  cell  representations.  T 
objective  of  the  learning  procedure  is  to  have  ea 
distinct  pattern  in  an  input  set  of  neurons  activaU 
single  neuron  in  a  second  set.  In  the  architecti 
shown  in  Fig.  2  this  is  accomplished  via  inhibit* 
connections  between  the  neurons  in  the  second  s 
Once  a  particular  neuron  in  the  second  layer  is  parti 
ly  turned  on  for  a  specific  pattern  it  prevents  l 
connections  to  the  other  neurons  in  the  second  set  fr* 
assuming  values  that  will  result  in  activity  at  m< 
than  one  neuron.  The  details  of  the  dynamics  of  si 
procedures  can  be  quite  complex  (e.g.,  see  Ref.  28), 
can  corresponding  optical  implementations.  An  < 
vantageous  feature  of  optics  in  connection  with  si 
organization  is  that  global  training  signals,  such 
fixed  lateral  inhibition  between  all  the  neurons  ii 
given  layer,  can  easily  be  broadcast  with  optical  beaj 
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Ftg.  2.  Two-layw  network  with  latonl  inhibition.  Connactiont 
ending  with  an  opan  drde  are  inhibitary. 
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The  basic  architecture  for  optical  implementation  of 
a  neural  computer  is  shown  in  Fig.  3.  The  figure 
presents  a  single  stage  of  what  may  be  a  multikyered 
system.  The  nonlinear  processing  elements  (i.e.,  the 
neurons)  are  arranged  in  planes.  We  have  included  a 
training  plane  for  reasons  which  will  become  clear 
below.  Neurons  in  one  plane  are  interconnected  with 
the  neurons  in  t^  same  or  other  planes  via  the  third 
dimension.  The  strength  of  the  interconnections  is 
determined  by  tiie  information  which  is  holographical¬ 
ly  stored  in  light  sensitive  media  placed  in  the  space 
.separating  the  neural  planes.  Volume,  rather  than. 
thin,  holograms  are  specified  in  Fig.  3  due  to  the  much 
greater  storage  capacity  of  volume  holograms  and  the 
arailabilityofexcellent  real-time  volume  media.  Pho- 
torefiractive  crystals  are  particularly  attractive  as  holo¬ 
graphic  media  in  this  application  b^use  it  is  possible 
to  record  information  in  these  crystals  in  real  time  at 
very  high  density  without  degrading  the  photorefrac- 
tive  sensitivity.  In  this  section  we  discuss  the  factors 
that  determine  the  maximum  number  of  connections 
that  can  be  specified  by  a  photorefractive  crystal  with  a 
given  set  of  physical  characteristics.  There  are  three 
distinct  factors  that  need  to  be  considered:  geometric 
limitations  arising  from  the  basic  principles  of  volume 
holography,  limitations  rising  from  the  physics  of  pho- 
tore^active  recording,  and  limitations  due  to  the 
learning  algorithms. 

The  Fourier  lenses  in  Fig.  3  transform  the  spatial 
position  of  each  neuron  into  a  spatial  firequency  associ¬ 
ated  with  light  emitted  by  or  incident  on  that  neuron. 
An  interconnection  between  the  ith  neuron  in  the  in¬ 
put  plane  and  the  jth.  neuron  in  the  output  plane  is 
formed  by  interfering  light  emitted  by  the  input  neu¬ 
ron  with  light  emitted  by  the  Jth  neuron  in  the  training 
plane.  The  image  of  the  Jth  training  neuron  lies  at  the 
position  of  the  Jth  neuron  in  the  output  plane.  The 
interference  of  the  training  signal  and  the  input  cre¬ 
ates  a  grating  in  the  recording  medium  of  the  form 


»  AiAj  npi/Kii  •  r),  (6) 

where  A,  and  Aj  are  the  amplitudes  of  the  fields  emit¬ 
ted  by  the  ith  and  Jth  neurons,  respectively.  Kij  is 
equal  to  Iq  -  ky  where  k,-  and  ky  are  the  spatial  frequen¬ 
cies  at  which  the  corresponding  amplitudes  propagate 
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Fig.  3.  Optical  Mural  conaputar  arehitaetuia. 


in  the  volume  medium.  This  grating  directs  an  input 
beam  at  spatial  frequency  ka  into  an  output  beam  at 
spatial  firequency  if  these  two  beams  satisfy  the 
Bragg  constraint  that  . 


k.-k,-K^  (7) 

This  constraint  is  obviously  satisfied  ifka«kiandk)}« 
ky.  In  general  this  solution  is  not  unique.  However, 
Psaltis  et  al.^  have  shown  that  by  placing  the  neurons 
on  the  input  and  output  planes  on  appropriate  fractal 
grids  of  dunension  3/2  it  is  p<mible  to  insure  that  only 
the  ith  input  neuron  and  the  Jth  output  neuron  may  be 
coupled  by  a  grating  with  wave  vector  Kiy.  In  this  case, 
recording  a  hologram  between  light  from  the  ith  input 
neuron  and  the  Jth  training  neuron  increases  the  con¬ 
nection  strength  between  the  ith  input  and  the  Jth 
output  without  directly  affecting  the  connections  be¬ 
tween  other  neurons.  If  instead  of  one  neuron,  pat¬ 
terns  of  neurons  are  active  on  the  fractal  grids  of  the 
input  and  training  planes,  the  hologram  recorded  in 
the  volume,  i.e.,  Eq.  (6)  summed  over  all  active  pairs  of 
neurons,  is  the  outer  product  of  the  pattern  on  the 
input  plane  and  the  pattern  on  the  training  plane. 
Exposing  the  hologram  with  a  series  of  M  pattern 
yields  the  sum  of  outer  products  described  by  Eq.  (2). 
Note  that  the  architecture  shown  in  Fig.  3  is  similar  to 
a  joint  Fourier  transform  correlator.  The  use  of  vol¬ 
ume,  rather  than  thin,  holograms  and  fractal  grids 
destrojrs  the  shift  invariance  of  the  correlator,  making 
this  architecture  a  totally  shift-variant  arbitrarily  in- 
terconnectable  system. 

A  basic  geometrical  limitation  on  the  density  of  in¬ 
terconnections  achievable  through  volume  holograms 
is  due  to  the  finite  volume  V  of  any  real  crystal  The 
refractive  index  n(r)  of  such  a  crystal  under  periodic 
boundary  conditions  may  be  represented  in  the  form 
s 

f»(r)  ■  ^  "r  expOk,  •  r).  (8) 


r.»0,±l.±2...,  (9) 


where  n,  is  the  amplitude  of  the  Fourier  component  at 
spatial  frequency  k,  and  Li  is  the  length  of  the  crystal 
in  the  i  direction.  Since  the  mmimnm  spatial  fre¬ 
quency  which  may  be  Bragg  matched  to  diffract  light 
at  wavelength  X  is  2ko,  where  ko  »  2ir/X,  the  sum  in  Eq. 
(8)  is  finite  in  holographic  applications.  The  number 
of  spatial  frequencies  in  the  sum  is  S  »  V/\\  Psaltis  et 
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alM  demonstrated  that  S  is  sufficient  to  fully  and 
independently  interconnect  neural  planes  whi^  are 
limit^  to  fractal  dimenaion  3/2.  Thus  in  this  previous 
work  the  issue  of  these  geometric  limitations  was  fully 
resolved  in  the  condition  that  processing  nodes  in  the 
input  and  output  planes  must  be  appropriately  ar¬ 
ranged  on  fractal  grids.  Other  geometric  limitations 
arise  due  to  Hnite  numerical  apertures  and  the  physics 
of  holographic  recording  mechanisms.  These  factors 
may  be  shown  to  contribute  a  scaling  factor  to  S  which 
is  independent  of  V  and  X.  For  V  -  1  cm^  and  X  ■  1 
Mm,  V/X3  is  equal  to  10^^.  In  interconnecting  neurons 
arranged  on  fractal  planes,  even  though  the  recording 
geometry  typically  ^ows  access  to  only  1%  of  grating 
wave  vector  space,  we  still  may  achieve  10^**  intercon¬ 
nections  per  cm^. 

We  now  address  the  question  of  whether  this  large 
number  of  gratings  can  be  supported  in  a  photorefrac- 
tive  crystal,  i.e.,  do  photorefractive  crystals  have  the 
capability  of  simultaneously  storing  gratings  each 
with  sufficient  diffraction  efficiency?  In  this  paper  we 
answer  this  question  based  on  simple  arguments  in  the 
context  of  a  neural  architecture.  The  conclusions  we 
reach  are  the  same  as  those  we  arrive  at  through  a  more 
thorough  examination  of  the  problem.  Photorefrac¬ 
tive  hol<^ams  are  produced  in  eiectrooptic  crystal  via 
the  modulation  of  the  index  of  refraction  by  the  space 
charge  field  created  by  an  optically  driven  u^omogen-  ■ 
eous  charge  distribution.  A  neural  network  architec¬ 
ture  implemented  in  volume  holograms  performs  a 
transformation  of  the  form 

£,  i,  expOk,  •  r)  exp0>i)  +  c.c.  -  ^  iry  expOVi/) 

J 

X  expOKj,  •  r) 

xfi^^expOkj-r) 

X  exp0>/)  +  C.C.  (10) 

between  the  field  amplitude,  Ej  out  exp(/ky  *  r),  of  the 
yth  neuron  and  the  field  amplitude,  £,  in  expifkj  -  r), 
incident  on  the  input  of  the  (th  neuron.  c.c.  denotes 
the  complex  conjugate  of  the  preceding  term.  and 
4fi  are  the  phases  of  the  field  amplitudes  corresponding 
to  the  (th  and  yth  neurons,  fiy  is  the  phase  of  the 
grating  which  connects  the  (th  and  yth  neurons.  The 
field  amplitude  diffraction  efficiencies  rnj  are  propor¬ 
tional  to  the  component  of  the  space  charge  density  in 
the  crystal  at  spatial  frequency  Kij  ■  Iq  —  k/.“  Tlie 
total  space  charge  density  due  to  N  stored  gratings  is 
constrained  at  every  point  in  the  crystal  to  be  less  than 
the  acceptor  trap  density.  This  implies  that 


Vij  expOVy)  •spO'Ky  •  t) 


} 


(U) 


where  vo  is  the  maximum  diffraction  efficiency  for  the 
field  amplitude  when  only  one  grating  is  recorded.  If 
is  an  independent  uniformly  distributed  random 
variable  on  with  high  probability  the  right- 

hand  side  of  Eq.  (11)  will  not  exceed  a  few  times  its 


standard  deviation,  VW2tii,  where  ni  is  the  rms  value  of 
Jti/.  This  fact  allows  us  to  find  a  simple  limit  for 
given  by 


Note  that,  although  we  have  assumed  that  the  sums  in 
Bq.  (II)  are  over  a  set  of  incoherent  sinusoids,  this  does 
not  imply  that  the  sum  in  Eq.  (10)  is  incoherent.  To 
illustrate  this  point  imagine  that  »  </>,  *-  <t>j.  In  this 
case  the  terms  in  Eq.  (10)  add  coherently.  However  if 
(>,  and  (fij  are  independent  random  variables  the  sums 
in  Eq.  (11)  still  add  incoherently.  Thus  a  random 
phase  term  in  the  transmittance  at  each  neuron  causes ; 
the  charge  densities  stored  in  the  crystal  to  add  inco¬ 
herently  but  does  not  necessarily  destroy  the  coher¬ 
ence  of  the  optical  system. 

The  holographic  transformation  described  above 
can  be  used  to  implement  neural  architectures  which 
map  an  activity  pattern  described  by  the  outputs  |xy}  of 
the  neurons  on  one  neural  plane  to  the  outputs  {yi|  of 
the  next  neural  plane.  In  a  coherent  optical  system  Xj 
is  represented  by  Ej  out  exp(;0y)  and  Wij  is  represented 
by  Vij  expOiffij).  Since  most  simple  optical  noniineari- 
ties  are  basM  on  absorption  the  transformation  be¬ 
tween  |xyi  and  |yil  typically  takes  the  form 


where  /  is  a  thresholding  function  implemented  in  the . 
neural  plane.  This  functional  form  might  be  avoided 
using  interferometric  detection.  In  an  incoherent  op¬ 
tical  system  xj  is  represented  by  lEj  outi*  and  Wy  is 
represented  by  vfi-  The  transformation  between  Ixyj 
and  (yil  takes  the  form 


(14) 


In  either  case  the  function  f  must  provide  sufficient 
gain  G  to  regenerate  the  signal  power  of  the  system 
after  each  layer.  If  we  assume  that  each  layer  contains 
v7/  neurons,  the  relationship  between  the  power  inci¬ 
dent  on  a  single  neuron,  /u,,  and  the  power  output  by  a 
single  neuron,  /out,  for  a  coherent  system  with  ^l’y  ®  0i  - 
0y  is 

\  12  ■ 


^  H,  npU^i/)£j  out  exp0>y) 


(15) 


From  Eq.  (12)  we  find 

G, 


-L , 

2po 


(16) 


For  an  incoherent  system  the  corresponding  relation¬ 
ship  is 


vW 


In  this  case  Eq.  (12)  yields 


(17) 
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Note  that  1/G  U  the  total  diffraction  efficiency  of  the 
volume  hologram.  Since  this  must  be  leas  th^  1  we 
know  that  G  >  1.  ik)  U  determined  by  the  physical 
properties  of  the  crystal,  including  the  maximum 
cha^e  density  available  for  grating  storage,  the  thick¬ 
ness  of  the  crystal,  and  its  electrooptic^coefficients. 
For  small  ni  we  may  estimate  rm  as 


where  L  is  the  length  of  the  crystal  along  the  optical 
axis.  For  *  10“®,  X  »  10"*  m,  and  L  *  10"*  m,  »jo  * 
0(1).  This  means  that  in  coherent  systems  relatively 
little  gain  [i.e.,  G  »  0(1)]  is  needed  to  recall  a  large 
number  of  sinusoidal  gratings  stored  in  a  photorefrac- 
tive  crystal.  Of  course  as  we  attempt  to  store  arbitrari¬ 
ly  many  gratings  other  limits  arise,  but  at  least  over  a 
finite  bandwidth  of  the  electrooptic  response  of  the 
crystal  coherent  systems  should  have  no  difficulty  in 
achieving  interconnection  densities  of  the  order  of 
those  implied  by  the  geometrical  constraints.  Inco¬ 
herent  systems,  on  the  other  hand,  are  unable  to  take 
advantage  of  holographic  phase  matching  and  are  thus 
less  efficient.**  To  achieve  N  *  10‘®,  for  example,  we 
must  supply  a  gain  of  G  *  10*  in  each  neural  plane. 
Examples  of  how  G  may  be  obtained  optically  include 
various  combinations  of  image  intensifiera  and  spatial 
iight  modulators  and  multiwave  mixing  in  nonlinear 
materials.  For  example,  an  optically  addressed  spa¬ 
tial  light  modulator  such  as  the  Hughes  liquid  crystal 
light  valve  is  sensitive  to  ~10  MW/cm*.  If  the  read-out 
beam  has  an  intensity  of  1  W/cm*  we  achieve  a  gain  of 
10*. 

The  choice  between  coherent  and  incoherent  imple¬ 
mentations  of  optical  neiual  networks  offers  advan¬ 
tages  and  disadvantages  on  both  sides.  The  incoher¬ 
ent  system  is  easier  to  implement  but  requires  the  large 
gain  described  above  and  offers  only  unipolar  activities 
and  interconnection  strengths.  The  coherent  imple¬ 
mentation  offers  bipolar  activities  and  interconnec¬ 
tions  but  requires  rigid  phase  stability  in  the  optical 
system  over  potentially  very  long  learning  cycles. 
T^is  stability  is  not  difficult  to  achieve  in  prescribed 
learning  architectures,  but  may  be  more  difficult  to 
achieve  in  adaptive  systems.  In  addition,  coherent 
systems  generally  square  the  signal  incident  on  the 
nonlinearity,  unless  interferometric  detection  is  used. 
Interferometric  detection  is  difficult  to  implement  in  a 
complex  optical  system.  Although  the  incoherent  sys¬ 
tem  is  straightforward  to  implement,  this  simplicity 
comes  at  a  cost  of  requiring  biasing  to  compensate  for 
unipolar  values  and  external  gain.  The  coherent  sys¬ 
tem  is  more  elegant  in  that  these  additional  mecha¬ 
nisms  are  not  necessary,  but  it  is  more  sensitive  to 
specific  design  issues.  One  way  of  making  coherent 
implementations  more  robust  might  be  to  include 
adaptive  optics,  such  as  phase  conjugate  devices,  to 
compensate  for  phase  instabilities.  Although  these 


devices  might  also  be  needed  in  adaptive  incoherent 
systems  to  detect  the  phase  of  a  grating  to  correctly 
update  the  associated  interconnection,  in  the  incoher¬ 
ent  case  it  is  only  necessary  to  detect  the  current  state 
of  the  phase.  In  the  coherent  case  it  is  generally  neces¬ 
sary  to  continuously  track  the  phase. 

IV.  Laaming  Architocluraa 

We  now  turn  to  the  question  of  how  we  can  specify 
the  strength  of  each  interconnection.  There  is  a  nice 
compatibility  between  simple  (multiplicative)  Heb- 
bian  learning  and  holography;  the  strength  of  the  con¬ 
nection  between  two  neurons  can  be  modified  by  re¬ 
cording  a  hologram  with  light  from  the  two  neurons. 
It  is  not  possible,  however,  to  record  multiple  holo¬ 
grams  in  a  single  crystal  independently.  Thus  far  we 
have  shown  that  the  space  ch^e  in  a  photorefractive 
crystal  may  be  arranged  to  achieve  a  very  large  number 
of  independent  interconnections.  The  task  that  re¬ 
mains  is  to  find  a  means  of  using  optical  beams  from 
outside  the  crystal  to  correctly  arrange  the  3-D  charge 
distribution.  In  particular,  we  must  find  means  to 
address  the  full  3-D  bandwidth  of  the  crystal  from  2-D 
neural  planes.  To  successfully  implement  learning 
with  photorefractive  crystals  the  nonlinear  dynamics 
that  govern  the  multiple  exposure  of  holograms  in  a 
photorefractive  medium  must  be  reconciled  with  the 
nonlinear  equations  that  describe  the  iterative  proce¬ 
dures  of  learning  algorithms.  It  is  extremely  difficult 
to  fully  characterize  analytically  the  ability  of  an  opti¬ 
cal  system  to  simulate  a  particular  learning  algorithm. 
We  will  have  to  rely  heavily  on  experiment  in  the 
search  for  the  optimum  match  between  nonlinear  op¬ 
tics  and  learning  procedures  for  neural  networks.  In 
this  section  we  describe  learning  architectures  which 
are  relatively  simple  to  implement  experimentally  and 
which  can  be  used  to  evaluate  the  capability  of  photo¬ 
refractive  crystals  to  store  information  in  the  form  of 
connectivity  patterns  in  a  neural  computer. 

The  first  learning  algorithm  we  consider  is  the  pre¬ 
scribed  sum  of  outer  products  of  Eq.  (2).  As  we  saw  in 
the  previous  section,  a  sum  of  this  sort  may  be  imple¬ 
mented  as  a  series  of  exposures  of  a  volume  hologram. 
In  a  photorefractive  crystal,  the  exposure  of  a  new 
hologram  partially  erases  previously  recorded  holo¬ 
grams.  This  places  an  upper  limit  on  the  maximum 
number  of  holograms  that  can  be  recorded  and  thus 
the  number  of  associations  M  that  can  be  stored  in  the 
crystal.  The  limit  is  found  by  determining  the  mini¬ 
mum  tolerable  diffraction  efficiency  for  each  associa¬ 
tion  and  solving  for  the  number  of  exposures  that  will 
yield  this  efficiency.  Let  be  the  amplitude  of  the 
mth  hologram  recorded.  After  a  total  of  M  exposures, 

where  Aq  is  ';he  saturation  amplitude  of  a  hologram 
recorded  in  the  photorefractive  crystal,  tn  is  the  expo¬ 
sure  time  for  the  mth  hologram,  and  r,  are,  respec¬ 
tively,  the  characteristic  time  constants  for  recording 
and  erasing  a  hologram  in  the  crystal.  We  allow  for  the 
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case  that  t,  ^  rr  in  light  of  limited  evidence  that  this 
may  be  the  case  in  some  crystals.^*  Ionic  conductivity 
is  one  mechanism  leading  to  multiple  time  constants. 
We  can  use  several  different  criteria  for  selecting  the 
exposure  schedule  For  example,  if  we  require  Am  * 

Am-t-i  for  all  m  we  obtain 

If  T,  ■  Te,  the  solution  to  Eq.  (20)  in  the  boundary 
condition  is 

tm  ■  r,  "*  >  <21) 

which  yields 

(22) 

For  the  case  t,  we  define  pm  such  that  tm  «  pmr*. 
Since,  from  Eq.  (19),  limM...iBfAi  ^  0,  Eq.  (20)  may  be 
satisfied  only  if  limm— ii^tm  *  0.  Thus  for  some  mo  >  1, 
Pmo  «  1  and  tflio  «  rr.  Tien,  from  Eq.  (20), 


By  induction,  for  m  >  m<j 

1 

p»- - j- 

(m  -  mo)  +  — 


(24) 


(26) 


As  m  grows  large  with  mo  fixed,  Eq.  (25)  can  be  shown 
to  yield 


p»* 


m 


(26) 


m  ' 


(27) 


The.  value  of  m  for  which  the  approximation  holds 
increases  with  the  n  io  ze/rr.  In  the  case  r, »  r«,  foi 
example,  T,/3t3  “  0-82  and  T«/l0t lo  *  0.95.  In  any  case, 
for  M  »  mo  for  some  mo  satisf^ng  the  constraints 
pi  .ceding  Eq.  (23), 


(28) 


for  all  m.  Solving  for  M  with  Am  «  Ao  we  find  a  limit 
for  M  given  by 


„  Ao 
Af  •  — 7—  • 

’•r  Am 


(29) 


This  result  agrees  well  with  what  we  might  expect 
intuitively.  The  number  of  exposures  sdlowed  in¬ 
creases  in  proportion  with  the  ratio  rt/zr  (if  we  erase 
slowly  we  can  store  more  holograms)  and  the  ratio  of 
the  maximum  possible  and  minimum  detectable  grat¬ 
ing  amplitudes. 


Fig.  4.  Optical  architacture  for  backward  error  propagation  laam- 

ing. 


The  second  architecture  we  will  discuss  is  capable  of 
implementing  the  backward  error  propagation  algo- 
ritlun®^  in  a  multilayered  network.  The  architec¬ 
ture,  shown  in  Fig.  4,  is  a  variation  on  a  system  we . 
described  previously system  as  shown  has  two 
layers  but  an  arbitrary  number  of  layers  can  be  imple¬ 
mented  as  a  straightforward  extension.  An  input 
training  pattern  is  placed  at  plane  Ni.  The  pattern  is 
then  interconnected  to  the  intermediate  (hidden)  lay¬ 
er  Ni  via  the  volume  hologram  Hi.  A  2-D  spatial  light 
modulator  placed  at  Ni  performs  a  soft  thresholding 
operation  on  the  light  incident  on  it,  simulating  the 
action  of  a  2-D  array  of  neurons,  and  relays  the  light  to 
the  next  stage.  Hologram  Hz  interconnects  N2  to  the 
output  plane  Nt  where  a  spatial  light  modulator  per¬ 
forms  the  final  thresholding  and  produces  a  2-D  pat¬ 
tern  representing  the  response  of  the  network  to  the 
particular  input  pattern.  This  output  pattern  is  com¬ 
pared  to  the  desired  output  and  the  appropriate  error 
image  is  generated  (either  optically  or  with  the  aid  of 
an  image  detector  and  rerecording)  on  the  spatial  light 
modulator  Ni.  The  undiffracted  beams  from  Ni  and 
Ni  are  recorded  on  spatial  light  modulators  at  N3  and 
Ns,  respectively.  The  signals  stored  at  No,  Ni,  and  Ns 
are  then  illuminated  from  the  right  so  that  light  propa¬ 
gates  back  toward  the  left.  The  backpropagation  al¬ 
gorithm  demands  a  change  in  the  interconnection  ma¬ 
trix  stored  in  Hi  given  by 

lM\f'  -  (30) 

where  a  is  a  constant,  e,-  is  the  error  signal  at  the  ith 
neuron  in  Nt,  x}°  is  the  input  diffracted  onto  the  ith 
neuron  in  Ni  from  N2,f'{x)  is  the  derivative  of  the 
thresholding  function  fix)  which  operates  on  the  input 
to  each  neuron  in  the  forward  pass,  and  is  the 
output  of  the  ;th  neuron  in  Ni.  Each  neuron  in  iVi  is 
illuminated  from  the  right  by  the  error  signal «,  and  the 
backward  transmittance  of  each  neuron  is  proportion¬ 
al  to  the  derivative  of  the  forward  output  evaluated  at 
the  level  of  the  forward  propagating  signal.  As  we 
have  described  above,  the  hologram  recorded  in  ^2  is 
the  outer  product  of  the  activity  patterns  incident 
from  Ni  and  Ns.  Thus  the  change  made  in  the  holo¬ 
graphic  interconnections  stored  in  Hi  is  proportional 
to  the  change  described  by  Eq.  (30). 

The  change  in  the  interconnection  matrix  stored  in 
Hi  required  under  the  backpropagation  algorithm  is 

«/(*i«)««!f>/'(xr)xi.  (31 ) 

i 

where  x^  is  the  activity  on  mth  input  on  Ni.  The  error 
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signal  applied  to  Nt  produces  a  diffracted  signal  at  the 
/th  neuron  in  N2  wUch  is  proportional  to 

(32) 

i 

We  assume  that,  during  the  correction  cycle  for  Hi,  Ns 
is  inactive.  Once  again,  ifthe  backward  transmittance 
of  the  /th  neuron  is  proportional  to  f(xf),  the  change 
made  to  the  hologram  by  the  signals  propagating  back 
from  Ni  and  Ns  is  proportioi^  to  the  change  pre¬ 
scribed  in  Eq.  (31). 

A  key  element  in  this  architecture  is  the  assumption 
that  the  spatial  light  modulators  at  Ns  and  N4  may 
have  transmittances  which  may  be  switched  between  a 
function  /(x)/x  for  the  forward  propagating  signal  and 
fix)  for  the  backpropagating  signd.  In  ^th  cases  x ' 
represents  the  forward  propagating  signal.  We  have 
previously  described  how  nonlinear  etalon  switches 
might  be  used  in  this  application.'’’^  Electrooptic  spa¬ 
tial  light  modulators  might  also  be  used.^ 

We  have  performed  an  experiment  to  show  how  a 
single  layer  of  error  driven  learning  might  be  imple¬ 
mented.  This  experiment  is  shown  schematically  in 
Fig.  5.  In  this  case,  the  stored  vectors  correspond 
to  2-D  patterns  recorded  on  a  liquid  crystal  light  valve 
from  a  video  monitor.  The  output  vectors  y^"*)  corre¬ 
spond  to  the  single  bit  output  of  the  detector  D.  An 
input  vector  is  imaged  onto  a  photorefractive  crystal 
via  two  separate  paths.  The  strength  of  the  grating 
between  the  image  of  the  input  along  one  path  and  the 
image  along  the  other  path  is  read  out  by  light  propa¬ 
gating  along  the  path  of  one  of  the  write  beams  in  the 
orthogonal  polarization,  i.e.,  while  the  write  beam  inci¬ 
dent  on  the  detector  is  linearly  polarized,  the  other 
write  beam  is  circularly  polarized.  The  polarizer  P 
blocks  the  linearly  polarized  beam  and  one  component 
of  the  diffracted  circularly  polarized  beam,  passing 
only  the  orthogonally  polarized  difh’acted  beam.  This 
allows  readout  of  the  grating  as  it  is  being  recorded. 
The  diffracted  light  is  imaged  onto  the  detector  D. 
This  system  classifies  input  patterns  presented  to  it 
into  two  classes  according  to  whether  the  output  of  the 
detector  when  the  pattern  is  presented  is  high  or  low. 
If  during  training  a  pattern  we  would  like  to  classify  as 
high  yields  a  low  response,  the  hologram  is  reinforced 
by  exposing  the  crystal  to  the  interference  of  the  two 
beams,  each  carrying  the  image  of  that  pattern.  This 
exposure  continues  until  the  diffract^  output  in¬ 
creases  by  a  fixed  amount.  If  a  pattern  which  should 
be  classified  as  low  is  found  during  training  to  yield  a 
diffracted  output  that  is  too  high,  the  hologram  dif¬ 
fracting  that  pattern  is  erased  by  a  fixed  amount  by 
exposing  the  cr3rstal  with  only  one  of  the  imaging 
beams.  (One  beam  is  blocked  by  the  shutter  S/f).  An 
experimental  learning  curve  showing  the  diffracted 
intensities  for  each  learning  cycle  for  four  training 
patterns  in  a  system  implemented  ur  'ng  an  Fe-doped 
LiNbOs  crystal  is  shown  in  Fig.  6.  The  system  classi¬ 
fies  the  patterns  0  and  2  as  high  and  1  and  3  as  low.  At 
first  all  patterns  are  low.  The  flrst  two  learning  cycles 
are  intended  to  drive  the  outputs  of  0  and  2  above 
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Pig.  5.  Simple  photorefirsctive  learning  tyetem:  PB  it  n  polarizing 
beam  splitter,  LI  and  ^2  are  imaging  lenses;  tVP  it  a  quarterwave 
plate:  SH  ia  a  shutter;  Pitt  polarizer;  D  it  t  detector;  Mitt  mirror. 


H  H  C  Di 


3  1  3 

Fig.  6.  Experimental  learning  curvea. 


threshold.  However,  they  have  the  undesired  effect  of 
also  driving  pattern  3  above  threshold.  Thus  in  the 
third  learning  cycle  3  is  erased.  In  this  particular  erase 
cycle  the  erasure  was  too  severe.  Note  that  pattern  2  is 
erased  in  this  cycle,  even  though  there  is  no  overlap 
between  this  pattern  and  pattern  3.  The  reason  for 
this  is  that  the  two  images  of  pattern  3  are  in  focus  only 
over  a  limited  region  of  the  crystal  volume.  Outside  of 
this  region  the  unfocused  image  may  erase  the  holo¬ 
gram  formed  by  pattern  2.  In  the  subsequent  two 
cycles  patterns  0  and  2  are  again  reinforced.  This  has 
the  unwanted  effect  of  driving  both  patterns  1  and  3 
just  above  threshold.  In  the  final  two  cycles  patterns  1 
and  3  are  erased  until  both  are  below  threshold.  At 
this  point  all  patterns  are  correctly  classified  and 
learning  stops. 

In  this  experiment  the  photorefractive  crystal  acts 
as  a  2-D  modulator.  The  diffraction  efficiency  be¬ 
tween  the  two  imaging  paths  is  high  where  the  patterns 
0  and  2  overlap  and  low  where  patterns  3  and  1  overlap. 
As  mentioned  above,  a  problem  arises  in  the  fact  that 
the  overlap  is  well  deHned  only  in  the  image  plane, 
meaning  the  crystal  must  be  thinner  than  the  depth  of 
focus  of  the  images.  To  utilize  the  full  capacity  of 
photorefractive  volume  holograms  it  will  be  necessary 
to  move  beyond  this  implementation  to  architectures 
utilizing  the  full  3-D  capacity  of  the  crystal  as  dis¬ 
cussed  above.  Nevertheless,  this  experiment  demon¬ 
strates  in  a  rudimentary  way  how  learning  in  photore¬ 
fractive  crystals  may  proce^. 
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V.  CondMion 

Photore&active  crystals  represent  a  promising  in¬ 
terconnection  technology  for  optical  neural  comput¬ 
ers.  The  ease  of  dynamic  holographic  modification  of 
interconnections  in  these  crystals  allows  the  imple¬ 
mentation  of  a  large  class  of  outer  product  learning 
networks.  The  density  of  interconnections  which  may 
be  implemented  in  these  crystals  is  limited  by  physical 
and  geometrical  constraints  to  the  range  of  from  10^  to 
per  cm^.  To  achieve  these  limits  consideration 
must  be  given  to  the  expoeure  schedule  of  the  crystal. 
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Abstract 

Geaerslised  Fourier  correlators  imposing  Unite  system  space-baodwidtli  products  are  de¬ 
scribed  and  a  class  of  binary  filters  is  proposed.  In  pattern  classification  and  signal  registration 
applications  it  is  shown  that  for  a  class  of  signals  the  binary  filters  yield  the  same  asymptotic 
performance  as  the  matched  filter.  It  is  hence  adduced  that  a  dynamic  range  of  a  single  bit  in 
the  filter  sufiices  for  classification  purposes.  The  effects  of  statistical  side-lobe  fluctuations  and 
a  finite  system  space-bandwidth  product  ate  included  in  the  analysis.  It  is  demonstrated  that 
performance  improves  in  a  natural  fashion  with  increase  in  the  system  space-bandwidth  product 
for  both  the  binary  filter  and  the  matched  filter. 

1  INTRODUCTION 

Matched  filters  are  commonly  used  in  diverM  ^plications  in  communication  systems,  signal  pro¬ 
cessing,  and  pattern  classification,  where  the  task  is  typically  the  recognition  of  a  particular  signal 
or  pattern  immersed  in  noise.  The  principal  theoretical  argument  supporting  the  use  of  Matched 
Filters  is  the  classical  result:  Among  the  class  of  all  linear  filters,  matched  filters  maximise  a  (suit¬ 
ably  defined)  signal-  to-noise  ratio  [1].  Practical  implementations  of  matched  filters — and  linear, 
shift-invariant  systems,  in  general — are  much  facilitated  by  the  fundamental  Fourier  convolution 
theorem  wherein  convolutions  (or  correlations)  in  one  domain  are  transformed  into  products  in 
the  Fourier  domain.  As  a  consequence,  relatively  simple  analog  implementations  such  as  optical 
Fourier-plane  correlators  [2],  and  digital  implementations  using  algorithms  such  as  the  Fast  Fourier 
Transform  [3]  abound. 

The  implementation  of  the  system  transfer  function  for  the  matched  filter,  however,  requires 
a  large  dynamic  range.  A  question  of  considerable  theoretical  and  practical  import  is  the  deter¬ 
mination  of  minimal  complexity  filters  which  have  minimal  dynamic  range  requirements,  and  for 
which  good  classification  performance  still  attains  (vis-h-vis  the  matched  filter).  The  issue  here  is 
to  determine  the  critical  information  needed  for  classification,  and  to  discard  redundant  informa¬ 
tion.  In  this  paper  we  propose  a  class  of  low  complexity  bmary  filters  which  are  a  step  toward  the 
resolution  of  this  question.  These  filters  encode  information  in  the  phase  of  the  Fourier  transform 
of  the  desired  signal  and  require  a  dynamic  range  of  just  one  bit. 
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Onr  prindpai  th«(»«tical  result  coaceming  the  binary  filters  is  the  foUowing:  For  sUUittieaily 
uneorrtlaUd  pattern  elaetet  the  binarp  JUtere  provide  the  same  asymptotic  ciassifiattion  performance 
as  the  matched  filter,  la  fact,  the  binary  filters  provide  classification  performance  comparable  to 
(thoogh  boonded  above  by)  the  matched  filter  over  all  ranges. 

These  binary  filters  are  of  considerable  practical  importance.  The  requirement  of  a  large 
dynamic  range  for  the  filter  (corresponding  to  the  many  bits  required  to  represent  each  sample  point) 
is  obviated,  and  just  a  single  representation  Ut  is  utilised  per  sample  point.  The  resultant  decrease 
in  required  mem<xy  storage  paves  the  way  fat  low  cost,  low  complexity  systems — both  digitai 
and  amalog — which  retain  good  classification  performance.  Of  particular  interest  in  optical  filter 
implementatioiu  is  the  recent  availability  of  a  two>tlimensional  binary  spatial  light  modulator — the 
magneto-optic  device.  We  have  demonstrated  good  dassification  in  experimental  optical  correlators 
with  our  binary  filters  implemented  using  these  devices  [4]. 

In  the  next  section  we  define  a  general  family  of  funded  space-bandwidth  product  Fourier 
correlators,  and  formally  prescribe  the  matched  filter  and  the  binary  filter  in  this  context.  We  also 
outline  the  signal  statistics  that  we  utilise,  and  set  up  a  performance  measure  which  incorporates 
information  about  both  the  correlation  peak,  and  the  side-lobe  energy  for  all  the  pattern  dasses. 
In  section  3  we  analyse  the  performance  of  the  matched  filter  and  the  binary  filter  in  a  two- 
class  pattern  recognition  problem  where  the  patterns  belong  to  well-defined  statistical  classes, 
and  are  noise-free.  We  obtain  analytical  results  for  the  performance  measure  as  a  function  of 
the  system  space-bandwidth  product  in  the  two  cases.  In  section  4  we  investigate  the  attrition  in 
dassification  performance  in  both  systems  when  the  input  patterns  are  corrupted  by  additive  noise. 
Sections  5  and  6  are  devoted  to  numerical  solutions  and  discussions  of  the  comparative  dassification 
performance  of  the  matched  filter  and  the  proposed  binary  filter:  We  demonstrate  the  monotonic 
improvement  in  performance  in  both  systems  as  the  system  space-bandwidth  product  is  increased, 
and  show  the  asymptotic  merging  of  the  performance  curves  for  the  binary  filter  and  the  matched 
filter. 

Notation:  Let  u>  be  some  fixed  (but  arbitrary)  positive  quantity.  To  each  real-valued  function, 
/,  of  a  real  variable  we  assodate  its  finite^domain  Fourier  transform  F^  formally  defined  by 

iL(a)«jr/(x)«-«"*d*.  (1) 

We  will  use  the  terminology  "space”  for  the  variable  x — the  domain  of  the  input  signals — and 
"frequency”  for  the  variable  u — the  domain  of  the  assodated  Fourier  transform. 

2  FOURIER  CORRELATORS 

2.1  Bounded  Space-Bandwidth  Syntems 

The  conventional  Fourier  correlator  of  equation  correlator  is  shift  invariant  and  admits  signals  of 
infinite  space-bandwidth  product  (SBP)  without  loss  of  information.  In  this  paper  we  will  analyse 
the  effect  on  dassification  performance  of  imposing  a  finite  system  space-bandwidth  product  In 
pa  .ticulau,  we  consider  shift  variant  Fourier  correlators  which  process  inputs  through  windows 
(-u; ,  w)  in  space,  and  (-1/ ,  v)  in  frequency:  For  a  given  signal,  /(x),  and  reference,  fi(x),  the 
output,  p(x),  the  bounded  space-bandwidth  correlator  is  given  by 

p(x)  *  jT . 
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W>  defin*  thft  9^tem  tpoct-handmiih  product,  wUck  w«  denote  by  p,  to  be  the  product  of  the 
widtk  of  tbe  epntial  nnd  frequency  windows:  p  »  Auv. 

We  conrider  two  representntive  pnttem  classes,  Ci  and  Ca>  The  input  signals  axe  real  valued 
functions,  /(x),  which  are  sample  realisations  (drawn  from  some  underlying  probability  distribu¬ 
tion)  of  one  of  the  two  pattern  classes  .  We  will  denote  by  /j(x)  the  input  conditumed  upon 
being  drawn  from  pattern  claas  Cj.  For  fixed  system  space-bandwidth  product,  p,  we  compare  the 
following  two  classifiers  fox  different  choices  ci  reference  signal,  h(x). 

Matched  Filter:  The  reference  signal,  h(x),  is  chosen  matched  to  the  sample  realisation,  /i(x), 
of  class  Cl.  The  correlation  output,  i^(x),  for  the  matched  filter  conditioned  upon  class  Cj  at  the 
input  is  given  by 

*!■(»)  =  £  (2) 

If  (i;  s  1/  s  00,  we  have  the  classical  matched  filter.  For  finite  p  a  correlation  peak  is  still  produced 
for  class  Ci.  (Classification  performance,  however,  deteriorates  as  p  decreases.)  Note  that  the 
matched  filter  above,  in  general,  requires  exponential  dynamic  range. 

Biliary  Filter:  The  reference  signal,  h(x),  is  chosen  such  that 

The  filter  hence  takes  on  values  -1  and  -l-l  only  at  each  frequency,  so  that  we  have  a  dynamic  range 
of  one  bit.  The  correlation  output,  ^(x),  of  the  binary  filter  conditioned  upon  class  Cj  at  the  input 
is  given  by 

«?(»)  =  jT  {/.,,(.))}  dr. .  (3) 

Note  that  the  binary  filter  tracks  the  phase  of  Fl„,i(tt),  so  that  we  can  expect  a  correlation  peak  for 
claas  Cl,  but  not  for  class  C3. 

In  figure  1  we  demonstrate  two  correlations  of  a  random  one-dimensional  input  sequence;  in 
figure  i(a)  the  correlation  was  accomplished  using  a  matched  filter,  while  in  figure  i(b)  the  corre¬ 
lation  was  performed  using  a  binary  filter.  As  seen,  the  correlation  peaks  and  side-lobe  fiuctuation 
levels  are  essentially  indistinguishable  in  the  two  cases. 

2.2  Perfommnce  Measure 

In  characterising  the  classification  performance  of  the  two  filters,  we  concentrate  on  two  key  mea¬ 
sures:  The  strength  of  the  correlation  peak,  and  the  side-lobe  structure.  For  specific  sample 
realisations  not  much  can  be  said  about  the  size  of  the  side-lobes;  however,  if  signal  statistics  are 
known  we  can  extract  peak  and  side-lobe  information  from  a  consideration  of  the  ensemble.  In  the 
next  section  we  describe  a  specific  statistical  structure  for  the  two  signal  classes  from  which  we  can 
obtain  quantitative  estimates  of  filter  performance. 

For  j  s  1,  2  let  gj(x)  denote  filter  output  conditioned  upon  class  Cj  being  present  at  the 
input.  Define 

Mj  =  8up{|E{p/(x)}|} , 

9 

rjj  =  8up{Var{pj(x)}} . 
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W«  define  tlM  performance  coefficient  p  by 


iaziisi!. 

*h  +*J9 


(4) 


Tbe  tenn  in  the  numerator  meaeoree  tbe  rdative  eUe  of  correlation  peaks  for  the  two  classes,  while 
the  term  in  the  denominator  factors  in  the  average  energy  in  the  side^lobes.  The  coefficient,  p, 
hence  is  an  infficator  of  how  well  the  filter  discriminates  class  Ci  from  class  C3. 

We  denote  by  p**  and  p*,  respectivdy,  the  performance  coeffident  for  the  matched  filter  and 
the  binary  filter.  We  shall  take  system  performance  to  be  a  monotonically  increasing  function  of 
the  coeffident  p,  with  the  system  with  the  largest  p  realising  the  best  performance. 

Note  that  the  form  of  the  coeffident  p  is  similar  to  a  signal^  to-noise  ratio,  the  “signal” 
corresponding  to  dass  Ci  and  the  “noise”  to  dass  Cj.  (hi  fact,  when  the  output  variable  y(x}  is 
Gansdan,  and  the  a  priori  probabilities  of  the  two  dasses  are  the  same,  it  toms  out  that  the  form 
of  the  Bhattacharyya  coeffident  [5]  is  identical  to  equation  4  for  p).  FVom  dassical  communication 
theory  we  have  that  for  correlational-systems  which  are  linear  functionals  of  the  input  signal,  the 
peak  signal-to-noise  ratio  for  a  signal  immersed  in  white  noise  is  obtained  for  the  matched  filter. 
Hence  we  expect  the  dassificadon  performance  of  the  binary  filter  to  be  bounded  by  that  of  the 
matched  filter. 


2.3  Signed  Statistics 

In  order  to  facilitate  analysis  we  assume  a  specific  statistical  structure  for  the  ensemble  os  signals 
in  the  two  dasses.  We  assume  that  the  sign^  /i(>)  ^d  /3(x)  corresponding  to  the  two  dasses  Ci 
and  C3  are  sample  realisations  of  mutually  independent,  white  random  processes  with 

E{/,(x)}  =  0, 

E{/i(*)/i(y)>  =  (5) 

The  signal  dasses  have  been  restricted  to  be  stationary  and  white  in  order  to  effect  some 
simpUdty  in  the  ensuing  analysis.  The  stationarity  coiutraint  can  be  rdaxed  to  allow  of  correlation 
functions  of  the  form  rj(x)S(x  —  y);  the  analysis  for  this  case  is  essentially  the  same  as  for  the  case 
we  consider.  With  the  added  constraint  that  the  process  be  Gaussian,  one  or  both  constraints  can 
be  relaxed  to  encompass  general  correlation  functions  of  the  form  rj(z,y). 

fVom  equation  1  the  real  and  imaginary  parts  of  are  given  by 

*  r  fj{x)  cos2xux dx  , 

9{fwj(u)}  =  I  fj{x)  nn2xux dx  .  (6) 

The  random  processes  fj{x)  are  independent  and  zero  mean.  By  virtue  of  the  Central  Limit 
Theorem  then,  it  can  be  rea^y  seen  to  follow  that  3S{i'u,,i(u)},  S{/w,3(u)},  and 

^  {•fwp(u)}  are  mutually  independent  Gaussian  random  processes  with  zero  mean.  Some  algebraic 
manipulation  readily  yields  the  following: 

E{»{iLj(u)}3{f’«j(t)}}  «  0,  (7) 

E{»  {i’wj(tt)}  E  {^wj(0}}  *  2w(tt  -  1)  +  sine  2w(u  +  t)] ,  (8) 

ss  [sine  2w(u  -  t)  -  sine  2u;(tt  + 1)] .  (9) 
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We  alio  require  the  fint  aad  aeco&d  momeata  of  the  random  proceaiei  ■gn{9t  {-FwjCtt)}}  and 
r,^  :  E*  -►  (-1,1]  by 


>  V  line  2u(u  -  <)  -f  line  2u>(tt  •>»  t) 

^  *  (1  +  iine4km)*^(l  +  iine  4wt)*7*  ’ 


(10) 


Note  that  from  equation  8  it  foUowi  that  for  eaeh  «  and  t,  rw(tt,  t)  ii  jnit  the  eorrelation  coefficient 
of  the  random  variablei  StC-fujCe)}  ud  The  f<ffiowing  resulti  can  be  readily  shown 

(cf.  ^1,  for  inatance). 


El8gn{«{/Lu(tt)}}l  » 

0, 

(11) 

E{|«{^«-j(«)}|>  « 

Vl  +  sinc4t./tt  , 

m 

E(8gn{«{fLj(u)}}sgn{»{PLj(0}}]  * 

■^iin“*ru,(tt,0 , 

(13) 

E{|«{Fa,,i(u)}||»{iLj(t)}|}  « 

— j-l-  |(1  -I-  siac4w«)^/*(l  +  8inc4<*;t)^/*{l  - 

r.,(u, !)»)'/’ 

+  (line 2u{u  - 1)  +  8bic 2u(u  + 1))  «n”'  »*«-(“«  O]  •  (14) 


3  TWO-CLASS  DISCRIMINATION 

3.1  The  Metched  Filter 

Our  coniideration  of  the  matched  filter  an  a  correlational  system  described  by  equation  2  differs 
somewhat  from  the  classical  deterministic  matched  filter  [1]  in  the  Inclusion  of  a  finite  system  space- 
bandwidth  and  the  repreaentation  of  both  input  aad  reference  signals  as  members  of  a  statistical 
class.  The  performance  coefficient,  th^  we  derive  hence  reflects  the  relative  correlation  peaks, 
and  the  ‘^oisy”  side-lobe  fluctuations  averaged  over  the  ensemble  as  a  function  of  p  (the  system 
SBP). 

We  estimate  the  parameters,  fij  and  qj,  in  equation  4  in  turn  for  the  two  classes  using  the 
results  tabulated  in  section  2.3. 

Class  Ci>  The  system  output  is  given  by 

rfW  =■  jT  ((s{f.,i(«)}p +13{r.4(’>))l')  • 

A  simple  computation  yields: 

fti  s  4uvaj,  (15) 

qi  »  f  (1  -  t)(sinc4wi/t)*dt .  (16) 
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Clan  Cti  Tha  cyitam  ootpnt  ia  givan  by 

5?(x)  »  jT  P«.a(u)  .  (17) 

The  correlatioii  peak  and  average  eide^Iobe  energy  can  again  be  aimply  estimated: 


Pa  -  0.  (18) 

tja  *  32wV^i  /  (1  -  t)(Mttc4<jW)^dt .  (IS) 

Defining  a  aa  a  function  of  the  apace'bandwidth  product  p  by 


Jo 

the  performance  coefficient  of  equation  4  ia  bence  given  by 

^  2at(p)(l  +  aaf/ff51 


(21) 


Asymptotic  rcsuttr.  Tbe  above  expresaion  can  be  readily  evaluated  for  extreme  values  of  the 
system  space-bandwidth  product.  For  very  low  space-bandwidth  products  a(p)  approaches  1/2,  so 
that 

jdl± 


r+2^7^  asp-*0. 

For  very  high  space-bandwidth  products,  on  the  other  hamd,  a(p)  asymptotically  approaches  the 
value  l/2p,  so  that 


The  asymptotic  results  correspond  wdl  with  intuition.  For  very  low  space-  bandwidth  prod¬ 
ucts  we  expect  a  low  processing  gain  for  the  system  aa  not  much  correlation  matching  cam  be 
obtauned.  For  high  space-bandwidth  products  on  the  other  hand,  the  use  of  uncorrelated  signals  at 
the  input  yields  large  processing  gains  increasing  Unewly  with  the  space- bandwidth  product. 

It  is  instructive  to  compare  the  performance  measure  given  by  equation  21  with  the  classical 
matched  filter  result  for  the  signal-to-ndse  ratio  (SNR)  of  a  deterministic  signal  immersed  in  white 
noise.  The  processing  gain  of  a  classical  system  (defined  to  be  the  ratio  of  the  output  SNR  to  the 
input  SNR)  is  given  essentially  by  the  signal  space-bandwidth  product  [1].  If  we  define  cl/o^  be 
a  measure  of  the  input  SNR  for  the  statistical  case  under  consideration,  then  the  processing  gain  of 
our  system,  in  the  limit  of  large  p  and  small  input  SNR,  is  pven  by  l/2a(p)  «  p,  which  is  precisely 
the  classical  result.  (The  additional  input  SNR  dependent  term  present  in  the  denominator  of 
equation  21  arises  because  the  statistical  side-lobe  fluctuations  are  also  taken  into  account  in  our 
performance  measure;  this  term  will  not  be  significant  for  low  input  SNR  scenarios.)  In  fine,  the 
presence  of  a  finite  system  space-bandwidth  product  manifests  itself  in  a  loss  of  processing  gain; 
the  larger  the  space-bandwidth  product,  the  more  the  processing  gain  realised  by  the  system. 


3.2  The  Binary  Filter 

The  system  output  conditioned  upon  class  Cj  being  present  at  the  input  is  given  by  equation  3. 
We  again  estimate  the  parameters,  pj  and  aj,  for  the  two  classes  in  turn. 
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Clue  Cit  The  oatpat  of  the  lyitem  with  fi(x)  at  the  input  is  given  by  substitution  in  equation  3: 

«f(*)  -  +  i  jT  .  (22) 

item  equations  7  and  12  it  then  follows  that 

m  a  sup  {|E{^(*)}|>  a  T  ^  (1  +  sinept)^/’  dt,  (23) 

where,  as  before,  p  is  the  space>bandwidth  product  Auv. 

Now,  in  equation  22  set 


*iw  = 

*a(*)  *  jr3{iL.i(«)}»gn{»{^w.i(«)}}e'^dtt. 


Then 

pf  (*)  a  hi(»)  +  t*a(l)  . 

Clearly,  iki(x)  and  k2(x)  are  uncorrelated  complex  random  processes  with  k^i^x)  being  zero  mean. 
Hence 

Var  {fff  (x)}  a  E  {|*i(x)j»}  +  E  {|*2(*)|'}  -  |E  {hi(x)}|»  . 

Using  equation  9  and  equations  12-14  we  obtain  after  some  algebraic  manipulation  that 

Var  {pf  (*)}  a  jf  ^  jT  ^  [sine  {|(u  - 1)}  sin"^  rp/4(tt,  t)  -  ^(1  +  sine  pu)^/* 

X  (1  +  sincpt)'^*(l  -  ^1  -  rp/4(u,t)3)j  cos  2x(tt  -  t)vx  du  dt .  (24) 


Note  that  r^(i/u,i/t)  a  rp/^(u,t),  which  can  be  verified  by  direct  substitution  in  the  defining 
equation  10  with  p  a  4ui/. 

No  analytic  expression  is  available  in  general  for  iji  =  sup,  {Varpf  (x)},  and  we  have  to 
resort  to  numerical  evaluation  for  specified  parameters  p,  af ,  and  (7|.  (Note  that  in  general,  the 
supremum  does  not  occur  at  x  a  Q.) 


Class  Cji  From  equation  3,  the  output  for  class  Cj  is  given  by 

sJ(*)  »  jf' • 

Again  having  recourse  to  section  2.3,  we  can  show  that 


P3 

Var{p}(x)} 


sup{lE{p5(x)}|}  aO,  (25) 

* 

4w(r2  o 

—  *  /  /  sine  {w a  -  <)}  sin"*  rpf^(u,  t)  cos  2x(tt  -  t)t'x  d«  dt .  (26) 


Agadn,  no  analytic  expression  can  be  found  for  tj2  a  sup,  {Varp|(x)},  is  general,  and  we  must 
resort  to  numerical  evaluation. 
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Define  h  u  fonctiont  ci  the  space*  bandwidth  product  p  by 


A.(p)  » 

(l  +  sincp<)'^’dlj  , 

(27) 

/3i(p)  * 

sup  J  J  smc{|(u-<)}  sia”‘r,/4(tt,0  “•  2*‘(«“  0*'»dttdl , 

(28) 

A(P)  » 

»ap  /  J  /  J  {§ («  “  *)}  0  -  +  dac  pu)^/^ 

X 

(1  +  8incpt)'^{l  -  (1  -  V4(t»»0*)'^*}}  2jr(tt  -  t)i/x  dudt . 

(29) 

Combining  the  results  of  equations  23-26,  and  using  the  defining  equations  27-29  we 
performance  coefficient,  p*,  of  the  binary  filter  to  be 

obtain  the 

#*• 

,  2/3o(p)(r?/<rj 

(30) 

We  will  return  to  a  comparative  analysis  of  the  expressions  21  and  30  in  section  5. 


4  CLASSIFICATION  IN  ADDITIVE  NOISE 

In  practice,  the  issue  of  system  rohustness  in  the  face  of  signal  degradations,  and  noise  becomes 
important.  We  illustrate  how  noisy  signals  result  in  performance  attrition  in  the  two  correlator 
systems. 

We  consider  the  case  where  the  input  signal  /(x)  is  contaminated  by  an  additive  noise  term 
n(x).  (We  assume  that  the  reference  signal,  h(x),  being  known  a  priori  can  hence  be  represented 
in  a  reasonably  accurate  and  noise*lree  manner).  We  take  »(x)  to  be  an  independent  noise  process 
which  is  additive  and  white  with 


E{n(x)}  =  0, 

E{n(x)n(y)}  =  <t’«(x  -  y) . 

The  input  signal  term  is  then  fj(x)  +  n(x),  and  the  reference  signal  term  (matched  to  class  Ci)  is 


4.1  The  Matched  Filter 

^et  y^n(s)  denote  the  (noisy)  correlation  output  of  the  system  when  the  input  signal  is  a  noisy 
:  tlis^on  of  class  Cj,  viz.,  /j(z)  +  n(z).  Then 

y 

where  the  first  term,  gfix),  is  the  noise*free  system  response  of  equation  2  and  the  second  term, 
is  the  additive  noise  term  in  the  output  correlation.  The  noise  term  independent  of  the  signal 
term,  and  is  zero  mean  with  peak  variance  at  the  origin 

Var{yjf(0)}  a  2p*<y?<rJo(p) , 
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id«itical  in  fonn  to  aquatioa  19.  Heac«t  otiag  eqoatioiu  15-20  we  have 

Mi.*  »  «p{|E{^(*)>|>*Pi, 

-  •ttp{Var{#J|,(»)}}«2p*o?o(i»)(2<^  +  o2), 

■  »»p{Var{^(»)}>»2p^oJoO»)((^  +  <T2). 

The  peifonnaace  coeffident  for  the  matched  filter  when  inpat  noise  is  present  is  hence 
given  by 


where  a(p)  is  as  defined  in  equation  20. 

A  comparison  of  equations  21  and  31  shows  that  the  presence  of  additive  input  noise  is 
equivalent  to  an  additive  increase  in  the  variance  (or  spread)  of  dass  Ca  by  exactly  twice  the  spread 
of  the  noise. 


m.a  +  *la.*  2o(p)(i  +  -5^) 


(31) 


4.2  The  Binary  Filter 

Tracing  through  an  analogous  analysis  yields  the  performance  coeffident  p”  for  the  binary  filter 
when  the  input  is  degraded  by  additive  ndse.  In  general,  however,  it  turns  out  that  the  form  of 
is  not  condudve  to  a  convenient  representation  as  in  equation  30  for  the  noise-free  case;  spedfically, 
in  equation  29,  the  functional  ^(p)  has  to  be  replaced  by  a  more  complicated  supremum  taken 
over  the  sum  of  two  integrals,  the  coeffident  of  one  being  erf,  and  of  the  other  being  (The 
supremum  is  now  a  function  of  not  only  the  space-bandwidth  product  p,  but  also  of  the  signal  and 
noise  variances.)  Using  sup  {A-t-P}  <  sup  {A}  +  sttp  {P},  we  can  arrive  at  the  following  convenient 
lower  bound  estimate  for  p*  for  the  sake  of  comparison: 


2^(p); 


(32) 


with  the  functionals  /3o(p),  ^(p),  ud  ^(p)  ^ven  by  equations  27-29. 

On  comparing  equations  30  and  32  we  see  that  the  effect  of  additive  noise  is  to  create  a 
larger  effective  spread  for  dass  C3  just  as  in  the  case  of  the  matched  filter.  In  both  cases,  the  noise 
effectively  reduces  the  ability  of  the  system  to  pick  out  class  Ci  by  increasing  side-lobe  energy,  and 
at  the  same  time  increasing  the  correlation  spread  of  class  C3. 


5  NUMERICAL  SOLUTIONS  AND  DISCUSSION 

Let  denote  the  ratio  -I-  20^^.  We  will  refer  to  0^  as  the  class  spread  ratio;  in  essence 
is  astatistical  measure  of  the  relative  strengths  of  "signal”  (dass  Ci)  and  ”noise”(dass  C3,  and 
additive  noise)  at  the  input  of  the  correlational  system.  Recapitulating  the  expressions  for  the 
performance  coeffidents  for  easy  reference,  we  have 

a  ^ 

2a(p)  +  4a(p)flr2  ’ 
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.  _  2A)(p)<y^ 

^  "  ^(p)  +  07(p)o^' 

where  the  fonctioiials  a(p),  j3o(p)t  A(p),  ^0  /hCp)  w  defined,  respectively,  in  equation  20  and 
equations  27-29. 

A  numerically  generated  family  of  performance  curves  for  the  two  systems  is  depicted  in 
figures  2  and  3.  In  ea^  figure  the  performance  coefficient,  p,  is  plotted  as  a  function  of  the  class 
spread  ratio,  and  the  family  of  curves  is  generated  by  varying  the  space-bandwidth  parameter 
p  between  8  and  256.  In  order  to  facilitate  comparison  between  the  matched  filter  and  the  binary 
filter,  for  values  of  p  =  8,  and  p  s  256,  the  corresponding  performance  curves  of  the  two  systems 
are  extracted  from  figures  2  and  3,  and  plotted  on  the  same  graph  in  figures  4  and  5. 

It  can  be  immediately  seen  from  the  figures  that,  all  other  things  being  held  constant, 
the  performance  coefficient  p  is  a  monotonieally  increasing  function  of  the  system  spaoe~bandwidth 
product  for  both  filtration  systems.  This  is  clearly  in  accordance  with  our  expectations  as  increasing 
the  system  space-bandwidth  product  is  equivalent  to  increasing  the  size  of  the  windows  in  the  space 
and  frequency  domains,  so  that  a  greater  degree  of  correlation  matching  can  be  obtained. 

Now,  when  the  class  spread  ratio,  is  large,  we  have  a  situation  where  the  noise  power, 
and  the  class  Cs  spread,  are  both  much  smaller  than  the  class  Ci  spread,  of.  This  can  be 
viewed  as  essentially  saying  that  patterns  of  class  Ci  can  take  on  values  from  a  much  wider  set  than 
can  patterns  of  class  Ci  and  the  noise  patterns.  The  probability  of  significant  cross-correlation  in 
any  particular  case  is  then  quite  small,  so  that  we  expect  good  classification  performance  for  large 
values  of  a.  This  intuitive  expectation  is  echoed  in  figures  2-5,  where  we  see  that  for  the  matched 
filter  and  the  binary  filter,  the  performance  coefficient  p  is  a  monotonically  increasing  function  of 
the  class  spread  ratio,  o^,  for  each  performance  curve  (corresponding  to  fixed  p). 

For  the  matched  filter,  a  close  examination  of  the  asymptotes  and  the  slope  near  the  origin 
of  each  performance  curve  reveals  that  "large  p"  behaviour  holds  for  relatively  small  values  of  the 
system  space^bandwidth  product  (as  small  as  p»8).  The  asymptote  of  the  performance  curve  for  the 
matched  filter  is  approximately  p/2,  and  the  graph  near  the  origin  is  a  straight  line  with  positive 
slope  p. 

Though  p*  is  always  bounded  from  above  by  p** ,  for  large  class  spread  ratios  the  performance 
curve  of  the  binary  filter  approaches  the  same  asymptote,  p/2,  as  the  matched  filter,  so  that  their 
performance  is  virtually  identical.  An  examination  of  their  relative  performance  for  each  p  in  the 
range  considered  indicates  that  when  the  class  spread  ratio  is  unity  (i.e.,  the  two  classes  have  the 
same  variance),  we  have  ffi  fa  2p**/3. 

6  CONCLUSION 

These  numerical  simulation  ),  coupled  with  the  prior  success  of  experimental  systems  utilising  binary 
filters  [4],  tend  to  bolster  the  intuitive  notion  that  the  phase  of  the  Fourier  Transform  contains  most 
of  the  information  content  in  the  signal.  The  significance  of  the  results  lies  in  the  demonstration 
that,  for  classification  purposes,  most  of  the  information  content  in  the  signal  can  be  extracted 
with  filters  of  low  complexity.  Specifically,  the  binary  filters  of  this  paper  require  only  a  single 
bit  dynamic  range  but  provide  classification  performance  comparable  to  the  matched  filter  which 
is  much  more  prodiguous  in  its  dynamic  range  requirements.  While  the  success  of  these  schemes 
is  very  encouraging,  some  questions  remain:  We  have  demonstrated  binary  correlator  structures 
based  on  heuristic  algorithms;  however,  it  is  not  immediately  obvious  whether  we  can  specify 
optimum  binary  correlator  structures  for  a  given  problem.  As  a  specific  instance,  we  can  obtain 
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filt«n  which  mvdmaUy  aepafste  p4tt«ni  clmet  in  that  the  filter  is  orthogonal  to  ail  unwanted 
patterns,  while  yidding  a  significant  correlation  only  if  the  desired  pattern  is  present.  It  is  not 
clear,  however,  whether  an  algorithm  can  be  specified  which  yields  the  binary  filter  which  is  the 
best  approadmatbn  to  any  such  m^vimmiiy  separating  filter. 
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Fig.  1(a) .  Correlation  of  a  random  sequence  using  a  Matched  Filter  [4]. 
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Fig.  2.  Plol  of  the  performance  coefflcieni,  fhe  Matched  Filter  vs.  the  class 

spread  ratio,  a*  with  the  system  space-bandwidth  product,  p,  as  a  parameter. 
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Fig.  3.  Plot  of  the  performance  coefflcieni,  p^,  of  the  Binary  Filter  vs.  the  class  spread 
ratio,  a*,  with  the  system  space-bandwidth  product,  p,  as  a  parameter. 
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Abstract — The  properties  of  higher  order  memories  are  described.  The  mm-redundant,  up  to  Nth  order  polynomial 
expansion  qfN-dimensional  binary  vectors  is  shown  to  yield  orthogonal  feature  vectors.  The  properties  of expansions 
that  contain  only  a  single  order  are  investigated  in  detail  and  the  use  of  the  sum  of  outer  product  algorithm  for 
training  hitter  order  memories  is  analyzed.  Optical  implementations  of  quadratic  associative  memories  are  described 
using  volume  holograms  for  the  general  case  and  planar  holograms  for  shift  invariant  memories. 


1.  INTRODUCTION 

An  associadvn  memory  can  be  thought  of  as  a  system 
that  stores  a  prescribed  set  of  vector  pairs  (jc”,  for 
m  »  i, . . . ,  ,^and  also  produces  y"  as  its  output  when 
x”  becomes  its  input  We  denote  by  N  and  Nq  the  di> 
mensionalities  of  the  input  and  output  vectors,  respec¬ 
tively.  When  the  output  vectors  are  stored  as  binary 
N(rtuples,  the  associative  memmy  can  be  implemented 
as  an  array  of  discriminant  functions,  each  dichoto¬ 
mizing  the  input  vectors  into  two  classes.  This  type  of 
associative  memory  is  shown  schematically  in  Figure 
1.  In  evaluating  the  effectiveness  of  a  particular  asso¬ 
ciative  memory  we  are  concerned  with  its  ability  to 
store  a  large  number  of  associations  (capacity),  the  ease 
with  which  the  parameters  of  the  memory  can  be  set 
to  realize  the  prescribed  mappings  (learning),  and  how 
it  responds  to  inputs  that  are  not  members  of  its  training 
set  (generalization).  In  this  paper  we  discuss  a  class  of 
associative  memories  known  as  higher  order  memories 
that  have  been  recently  investigated  by  a  number  of 
separate  research  groups  (Baldi  &  Venkatesh,  1987; 
Chen  et  al..  1986;  Giles  &  Maxwell,  1987;  Maxwell, 
Giles,  Lee,  &  Chen,  1986;  Newman,  1987;  Poggio, 
1975;  Psaltis  St  Park,  1986;  Scgnowski,  1986).  Our  mo¬ 
tivation  for  investigating  these  memories  was  the  in- 


*  Funded  by  the  Air  Rxve  Office  of  Sdentific  Retnrch,  tbe  Anny 
Keeeercfa  Office  end  the  Defenie  Advanced  Reseirch  Pngeca  Apacy. 

t  Dc  Hoot  is  earn  with  the  Rockwell  Sdence  Center  Thotuand 
Oakt,  CA  91360. 

Requettf  for  reprints  should  be  sent  to  Demetri  lAaltis,  Depart¬ 
ment  of  Electrical  Engineerins,  California  Institute  of  Ibcbnoloey, 
Fhaadena,CA9112S. 


crease  in  storage  capacity  that  results  from  .  he  increase 
in  the  number  of  independent  parameters  or  degrees 
of  freedom  that  is  needed  to  describe  a  higher  order 
associative  mapping.  The  relationship  between  the  de¬ 
grees  of  freedom  of  a  memory  and  its  ability  to  store 
associations  (Abu-Mostafa  &  Psaltis,  1985)  is  funda¬ 
mental  to  this  work  and  we  state  it  in  the  following 
subsection  as  a  theorem. 

1.1  Degrees  of  Freedom  and  Storage  Capacity 

Let  D  be  the  number  of  independent  variables  (de¬ 
grees  of  freedom)  we  have  under  our  control  to  specify 
input-output  mappings  and  let  each  parameter  have 
K  separate  levels  or  values  that  it  can  assume.  We  define 
the  storage  capacity  C  to  be  the  maximum  number  of 
arbitrary  associations  that  can  be  stored  and  recalled 
without  error. 


Theorem  1. 

Dlo^ 

JVo 


(1) 


Proof:  The  number  of  di^rent  states  of  memtxy  is  given 
by  and  the  total  number  of  outputs  that  a  given  set 
of  Af  input  patterns  can  be  mapp^  to  is  2^.  If  the 
number  of  mappings  were  larger  than  the  number  of 
distinct  states  of  the  memory,  then  mappings  would 
exist  that  are  not  implementable.  Requiring  that  all 
mappings  can  be  done  leads  to  the  relationship  of  the 
theorem. 

The  equality  in  (1)  is  achieved  by  Boolean  circuits 
such  as  programmable  logic  arrays  and  an  extreme  case 
of  a  higher  order  memory  we  will  discuss  later.  When 
the  equality  holds,  resetting  any  one  bit  in  any  one  of 
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the  parameters  of  the  memory  gives  a  diffiavnt  map¬ 
ping.  Sudi  a  memory  cannot  learn  from  the  training 
set  to  respond  in  some  desirable  way  to  inputs  that  it 
has  never  seen  before.  The  only  wi^  to  get  generalization 
when  C  •  D  logj/T/A/b  is  to  impose  on  it  the  overall 
structure  of  the  memory  before  learning  begins.  One 
of  the  ^pealing  features  of  neural  architectures  is  the 
considerable  redundancy  in  the  degrees  of  freedom  that 
is  typically  available.  Therefore,  there  is  hope  that  while 
a  memory  learns  specific  input-output  correspon¬ 
dences  it  can  also  discover  the  underlying  structure  that 
may  exist  in  the  problem  and  kam  to  respond  correctly 
for  a  set  of  inputs  much  larger  than  the  training  set. 
Moreover,  the  same  redundancy  is  responsible  fiM*  the 
error  tolerance  that  is  evident  in  many  neural  archi¬ 
tectures.  Higher  mxler  memories  are  generally  redun¬ 
dant  and  they  can  provide  us  with  a  methodology  for 
selecting  the  degree  of  redundancy  along  with  the  num¬ 
ber  of  degrees  of  freedom  and  the  associated  capacity 
to  store  random  problems. 

It  is  important  to  keep  in  mind  that  (1)  holds  for 
arbitrary  mappings.  If  the  input  and  output  vectors  are 
restricted  in  some  way  that  happens  to  be  matched  to 
the  architecture  of  a  particular  associative  memory  then 
it  may  be  possible  to  overcome  this  limit  However, 
selecting  the  architecture  of  the  associative  memory 
such  that  it  optimally  implements  only  a  subset  of  all 
possible  associations  is  basically  equivalent  to  choosing 
the  architecture  so  that  it  generalizes  in  a  desirable  way. 
For  instance  suppose  that  we  design  an  associative 
memory  so  that  it  is  shift  invariant  (i.e.,  the  output  is 
insensitive  to  a  change  in  the  position  of  the  input) 
(Maxwell  et  al.,  1986;  Psaltis  &  Hong,  1987).  Then  this 
system  will  respond  predictably  to  all  the  shifted  ver¬ 
sions  of  the  patterns  that  were  used  to  train  it  We  can 
equivalently  think  of  this  system  as  having  a  larger  stor¬ 
age  capacity  than  the  limit  of  (1)  over  the  set  of  shift 
invariant  mappings.  If  we  can  identify  a  priori  the  types 
of  generalization  we  wish  the  memory  to  exhibit  and 
we  can  find  ways  to  impose  these  on  the  architecture, 
then  this  is  certainly  a  sensible  thing  to  do.  Higher  order 
memories  can  also  provide  a  convenient  framework 
within  which  this  can  be  accomplished. 

The  penalty  we  must  pay  for  the  increase  in  the  stor¬ 
age  capacity  tlut  is  afford^  by  the  increase  in  the  de¬ 
grees  of  freedom  in  a  higher  order  associative  memory 
is  increase  in  implementation  complexity.  The  com¬ 
puter  that  implements  a  higher  or^  memory  must 


have  sufficient  storage  capacity  to  store  a  very  large 
number  of  parameters.  Moreover  it  must  be  capable  of 
addressing  the  stmed  information  with  a  high  degree 
of  parallelism  in  order  to  produce  an  output  quickly. 
We  will  discuss  in  this  papa-  optical  implementations 
of  second  order  memories  and  we  will  show  a  remark¬ 
able  compatibility  between  the  computational  require¬ 
ments  of  these  memories  and  the  ability  of  optics  to 
store  information  in  three  dimensions. 


IJl  Linear  Discriminant  FhnctioDs  and  Assodadve 
Memories 

We  will  consider  as  a  precursor  the  most  familiar 
associative  memories  that  are  constructed  as  arrays  of 
linear  discriminant  functions  (Kohonen,  1984).  A  lin¬ 
ear  discriminant  function  is  a  mapping  from  the  sample 
space  X,  a  subset  of  to  1  or  -1. 

y  *  sgn{w'*x  +  >H)} 

»  Sgn{M'o  +  WiXl  +  WiXl  +  •  •  •  +  (2) 

where  sgn  is  the  signum  function,  w  is  a  weighting  vector 
and  wd  is  a  threshold  value.  In  this  case  the  capacity  is 
uppertounded  by  {N  -f  l)log2/r  according  to  our  def¬ 
inition  of  capacity.  In  this  relatively  simple  case  the 
exact  capacity  is  known  to  be  equal  to  C  >  W  +  1 
assuming  the  input  points  are  in  general  position  and 
K  ^  OD  (Cover,  196S).  An  associative  memory  is  con¬ 
structed  by  simply  forming  an  array  of  linear  discrim¬ 
inant  functions  each  mapping  the  same  input  to  a  dif¬ 
ferent  binary  variable.  Several  algorithms  exist  frMr 
training  such  memories  including  the  perceptron, 
Widrow-Hoff,  sum  of  outer  products,  pseudoinverse, 
and  simplex  methods  (Duda  &  Hart,  1973;  Hopfield, 
1982;  Kohonen,  1984;  Venkatesh  &  Psaltis,  in  press). 
This  memory  can  be  thought  of  as  the  first  order  of  the 
broader  class  of  higher  order  memories  that  contain 
not  only  a  linear  expansion  of  the  input  vector  but  also 
quadratic  and  higher  order  terms.  We  will  see  in  Section 

3  that  the  learning  methods  that  are  applicable  to  the 
linear  memories  generalize  directly  to  the  higher  order 
memories.  First,  however,  we  will  describe  the  prop¬ 
erties  of  the  mappings  that  are  impiementable  with 
higher  order  memories  in  Section  2.  Finally,  in  Section 

4  we  will  describe  optical  implementations  of  quadratic 
optical  memories  (Psaltis,  Park,  &.  Hong,  1986). 


High  Order  Assodaiivt  Memories 

2.  PROPEBTIES  OF  HIGHER 
ORDER  MEMORIES 

A  ^function  is  defined  to  be  a^lxeie/ inapping  dl'tbe 
input  vector  x  to  an  L-dimensional  vector  a  fi)U0Med 
by  a  linear  discriminant  function. 

y-sgn{w"*i(x)-nn,} 

-  Sgn{w',Z|  +  ¥^222  +  •  •  •  +  w'lZl  +  »«b}  (3) 

where  i(x)  -  (ri(x),  zjfx), ....  zd*))t  W  is  an  I.  di- 
mensional  weighting  vector  and  i(x)  is  an  L  dimen¬ 
sional  vector  derived  from  x.  The  storage  capacity  in 
this  case  is  equal  to  the  capacity  of  the  second  layn  L 
+  1  (Owei;  IMS)  iftbe  samples  z  are  in  general  position 
whereas  the  upper  bound  on  the  capacity  from  (1)  is 
(L  +  1  )log2  AT.  liie  inefficiency  in  this  case  is  log]  JlTbits, 
the  same  as  fix'  the  linear  discriminant  function  even 
though  the  capacity  can  be  raised  arbitrarily  by  in¬ 
creasing  L.  It  is  not  known  what  the  exact  relatiorrship 
between  L  aitd  K  is,  that  is,  we  do  not  know  whether 
for  higher  dimensions  we  need  better  resolution  for  the 
values  of  the  weights  to  be  capable  of  implementing  a 
fixed  firactioo  of  the  linear  rruppings.  Recently,  Mok 
arxl  Psaltis  (persorud  communication)  have  foutKl  the 
asymptotic  (large  N)  statistical  capacity  to  be  C  = 
for  a  linear  discriminant  function  with  binary  wei^ts. 
This  result  implies  that  even  for  large  N,  for  the  vast 
majority  of  linear  dichotomies,  a  large  number  of  levels 
is  not  required.  Therefore  a  ^functioo  is  an  effective 
and  straightfinward  method  for  increasing  the  capacity 
of  an  associative  memory  without  loss  in  efficiency. 

A  higher  order  associative  memory  is  an  array  of  ^ 
functions  with  the  mappings  z(x)  being  polynomial  ex¬ 
pansions  of  the  vector  x.  The  schematic  diagram  of  a 
higher  order  associative  memory  is  shown  in  Figure  2. 
When  the  polynomial  expansion  is  of  the  rth  order  in 
X  then  the  output  vector  y  is  given  by 

y/  *  sgn{  W'd%,  X, . . . ,  x)  +  H^'(x, . . . ,  x) 

+  •  •  •  +W?(x,  x)  +  1F)x  +  w/o}  (4) 

where  1, . . . ,  iVoi  is  a  A:-linear  symmetric  map¬ 
ping  and  w)  is  equivalent  to  w'  in  (2).  According  to 
(3) 

Z;(x)  »  x;\ij}Xnu)  •  •  •  (5) 


I5t 

where  y  -  1,  2, . . . ,  L,  PiU)  €{1,2,,..,  A^},  such 
that  all  the ;  are  diAinct,  and  Ai,  iij. . . . ,  n,  -  0,  1. 
Then  L  is  C*')  (Cover;  1%5),  and  hence  the  capacity 
bourxl  is  {(!^*')  +  l)logaff  as  befixe.  For  example,  if  r 
-  2,  the  function  becomes  quadratic  and  has  the  form 
y,  *  x'Wfx  +  Wjx  +  w,o  and  the  number  of  non- 
r^undant  terms  in  the  quadratic  expansion  is  I.  •  (W 
+  lXAr+2)/2. 

The  c(Mnp(Mrents  of  the  vector  z  are  txnary-valued 
if  X  is  binary.  In  this  case,  the  samples  cannot  be  as¬ 
sumed  to  be  in  general  position  since  there  are  at  most 
^  +  2  biruuy  vectors  in  A^  dimensional  space  which  lie 
in  general  position.  We  will  evaluate  the  effiKtiveness 
of  higher  order  mappings  hr  producing  representations 
z(x)  that  are  sepatrfole  by  tte  second  lajw  of  weights 
by  calculating  tte  Hamming  distance  between  z  vectors 
given  the  Hamming  distance  between  the  corresponding 
X  vectors.  We  expect  that  if  the  Hamming  distance  be¬ 
tween  two  binary  vectors  is  large  then  they  are  easy  to 
distinguish  from  one  another. 

2.1  Complete  Polynomial  Expaashm  of  Binary 
Vectors 

There  are  at  most  2^  non-redundant  terms  in  any 
polynomial  expansion  (4)  of  a  bitutry  vector  x  in  A^ 
dimensions.  First,  we  will  consider  the  following  A^tb 
(xder  expansion  (or  equivalently  bit  production)  f<x  the 
bipolar  vectors  x  in  A^  dimensional  Unary  space  { I, 
-1}^: 

z-z(x) 

-  (1,  Xi,X2,  . . .  ,  Xat,  XiX2 . XiX2-  ‘  -XnY.  (6) 

If  we  apply  a  linear  discriminant  function  to  the  new 
vectors  z,  then  the  capacity  becomes  2^  which  is  equal 
to  the  total  number  of  possible  input  vectors  (Pnltis  & 
Park,  1 986).  In  other  words  this  memory  is  capable  of 
performing  any  mapping  of  N  binary  variables  to  any 
binary  output  vector  y.  Of  course  the  number  of  weights 
that  are  needed  to  implement  this  memory  grows  to 
2^  times  No,  the  number  of  bits  at  the  output  In  what 
follows  we  show  that  in  this  extreme  case  the  vectors  z 
become  orthogonal  to  each  other. 

Theorem  2.  If  we  expand  binary  vectors  1, 2, 

. . . ,  2^^^  in  .If*  *  (1,  -l}^  to  2^  dimensional  binary 
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vectors  s*  aocordtng  to  (6),  where  is  the  dimen, 
skxulity  of  the  origiiul  feature  yecton,  then  (a) 

■  y'Smtmt  where  <  • ,  •  >  is  an  inner  product,  (b) 
1,  zT  -  0,  (c)  and  2,  af  -  0. 

Example:  Table  '  is  for  the  case  oSN  •  Z.  Note  the 
orthoponalitv  and  the  numbers  of  Is  and  -Is  in  the 
new  vectors  and  the  set  of  each  component  of  them 
except  the  fiist  vector  and  the  set  of  the  first  compo¬ 
nents. 

Proof:  (a)  Let  us  consider  any  two  dififiaent  binary  vec¬ 
tors  in  the  binary  space  of  { 1,  - 1  whose  Hamming 
distance  is  n  (1  ^n^N).  When  they  are  expanded  to 
two  2^  dimensional  tnnary  vectors,  the  number  of  Acth 
order  terms  that  have  opposite  signs  in  the  two  expan¬ 
sions  is 


Notice  that  two  polynomials  have  different  values  if, 
and  only  if,  they  have  an  odd  number  of  terms  whose 
signs  are  apposite.  The  Hamming  distance  between  the 
two  fully  expanded  (up  to  order  2^  vectors  can  be  cal¬ 
culated  by  adding  the  number  of  terms  that  have  dif¬ 
ferent  signs  over  all  the  orders  of  the  expansion: 


The  fact  that  the  Hamming  distance  is  2*'~'  for  any 
two  expanded  vectors  (for  any  n)  proves  that  all  of  the 
2^  vectors  become  orthogo^  and  that  (x"',  x^) 
~  (b)  Just  think  of  the  cases  where  one  of  the 

two  vectors  is  (1, 1 . 1).  Then,  all  the  other  vectors 

z  have  equal  number  of  Is  and  -  Is  because  their  Ham¬ 
ming  distances  are  all  2^^''  from the(l,  1,. . . ,  1) vector, 
(c)  See  Duda  and  Hart  (1973,  p.  1(^). 

Slepian  has  discussed  this  orthogonalization  property 
as  a  method  for  designing  orthogonal  codes  and  has 
given  a  different  proof  for  it  (Slepian,  19S6).  The  proof 
presented  here  is  useful  for  characterizing  higher  order 
memories  because  it  allows  us  to  trace  the  contribution 
of  each  order  of  the  expansion  to  the  orthogonalization 
and  immediately  derive  results  about  the  properties  of 
quadratic  and  cubic  memories.  The  output  vector  y  is 

y,  -  sgn{  IK,  -  z}  “  sgn{  S  (9) 

where/*  1, . . . ,  A^band  If/ is  a  2^  dimensional  weight¬ 
ing  row  vector.  The  matrix  If'/,  that  can  implement  the 
x"  y"  mapping  for  m  *  1  to  2^  can  be  formed  in 
this  case  simply  as  the  sum  of  outer  products  of  y"  and 

in  -  2  yPzr.  (10) 


2.2  Expaosioas  of  a  Single  Order 

The  orthogonalization  property  of  the  full  expansion 
is  interesting  because  it  shows  that  higher  order  mem¬ 
ories  provide  a  complete  fhunework  that  takes  us  from 
the  simplest  “neuron,”  the  linear  discriminant  func¬ 
tion,  to  the  full  capability  of  a  Boolean  look-up  table. 
Higher  order  memories  can  indeed  provide  a  >aluable 
tool  for  designing  digital  programmable  logic  arrays. 
In  this  paper,  however,  we  are  interested  in  associative 
memories  that  are  capable  of  accepting  inputs  with 
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lugB  N  (e.g.,  if  10*  then  2^  *»  10**)  in  which  case 
considering  a  full  expansion  of  the  input  data  is  com¬ 
pletely  out  of  the  question.  In  such  cases  we  are  really 
interested  in  an  expansion  that  contains  a  large  enough 
number  of  terms  to  provide  the  capacity  needed  to  learn 
the  problem  at  hand.  In  this  subsection  we  analyze  the 
properties  of  partial  expansions  that  include  all  the 
terms  of  (Mie  order. 

We  will  first  consider  the  memory  consisting  of  all 
the  terms  of  a  quadratic  expansion  with  binary  input 
vectors. 

y/-sgn{2  2  wiijXiXj} 

i  J 
L 

=  sgn{  2  (11) 

*-l 

The  number  of  non-redundant  terms  in  a  quadratic 
expansion  of  a  binary  vector  is  L  =  N(,N  -  l)/2.  Let 
two  input  vectors  have  a  Hamming  distance  n.  The 
angle  between  these  two  vectors  is  given  by  the  relation 
cos  fli  “  1  -  (InlN).  The  angle  82  between  the  corre¬ 
sponding  z(x)  vectors  can  be  readily  calculated  since 
we  know  their  Hamming  distance  from  the  proof  of 
Theorem  2(a): 


costfj  =  1 


4n(N  -  n) 
NiN-  1) 


1  —  4p  +  4p*  =  ( 1  —  2p)* 


(12) 


where  p  *  n/N.  $2  and  dt  are  plotted  versus  p  in  Figure 
3a.  For  p  <  .3,  02  is  always  larger  than  0| .  Specifically 
for  p  <  1,  02  =  V2  X  01.  We  sec  therefore  that  the 
quadratic  mapping  not  only  expands  the  dimensionality 
which  provides  capacity  but  also  spreads  the  input 
samples  apart,  a  generally  desirable  property.  For  p 
>  .5  the  quadratically  expanded  vectors  are  closer  to 
each  other  than  the  original  vectors  and  in  the  extreme 
case  n-  N,  $2  becomes  zero.  This  insensitivity  of  the 
quadratic  mapping  to  a  change  in  sign  of  all  the  bits  is 
a  property  that  is  shared  by  all  even  order  expansions. 
Next  we  consider  a  cubic  memory 

yi  *  sgn{  2  2  2  yviijkXiXjXk) 

i  J  k 
L 

«  sgn{  2  w'taZ,}  (13) 

#1-1 

where  L  -  (?)  +  N.  In  Figure  3b  we  plot  0j,  the  angle 
between  two  cubically  expanded  binary  vectors  as  a 
function  of  p.  For  convenience,  0|  is  also  plotted  in  the 
same  figure.  In  this  case  ^  increases  faster  with  p  for 
p  <  .5.  For  p  ^  I,  8j  *  1/3  X  8) .  At  p  m  .4  the  cubic 
expansion  gives  essentially  perfectly  orthogonal  vecton 
while  for  p  >  .5, 8)  renuuns  smaller  than  0|  and  in  the 
limit  p  »  1,  0j  -  T.  Thus  the  cubic  memory  discrim¬ 
inates  between  a  vector  and  its  complement. 


The  basic  trends  that  are  evident  in  the  quadratic 
and  cubic  memories  generalize  to  any  order  r.  The 
number  of  independent  terms  in  the  rth  order  expan¬ 
sion  of  a  binary  vector  is  (?)  which  is  maximum  for  r 
H/2.  Again  this  is  not  of  practical  importance  be¬ 
cause  the  number  of  terms  in  a  full  expansion  of  this 
sort  is  prohibitively  large.  What  is  of  interest  however 
is  the  effectiveness  with  which  relatively  small  order 
expansions  can  orthogonalize  a  set  of  input  vectors. 
The  angle  8,  between  two  vectors  that  have  been  ex¬ 
panded  to  the  rth  order  is  given  by  the  following  rela¬ 
tion: 


COS0r 


(?)-2  2,-«m(7X?;") 
(?) 


(14) 


We  can  obtain  a  simpler  expression  for  the  interesting 
case  r  <4  Nstnd  for  small  p,8ras  VrXdi. 

Proposition  3:  For  r<N. 


cos0,«  (1  -  2p)'.  (15) 

Moreover,  for  small  p, 

0r«V^i  (16) 

where  0|  2Vp. 

Proof:  For  a  small  r,  we  can  make  the  approximations 
(?)  «  N'/r\,  (D  «  rt^/i!,  and  (?:/■)  ^  (N  -  nT'Kr 
-  <)!.  Then,  cos  8,  is  approximated  as  follows: 


*  (1  -  2p)' 

because  of  these  relationships: 

2  +  2  =  (1  -  p  +  p)'^  =  1. 

I'-odd  i-«veo 

2-2  =-(i  -  p-p)'  =  -(i  -  2p)'. 

/-odd  /-even 

When  p  <  I,  cos  0„  which  is  approximately  1 
-  »?/2!.  is  approximated  by  1  -  2rp  directly  from  (14) 
or  from  (15).  Therefore,  it  is  followed  by  (16)  that  0, 
»  2VJp. 

We  plot  8,  versus  p  for  selected  orders  in  Figure  4 
using  ( 1 5).  It  is  evident  that  increasing  r  results  in  better 
separated  feature  vectors.  Polynomial  mappings  act  as 
an  efifective  mechanism  for  increasing  the  dimensional¬ 
ity  of  the  space  in  which  inputs  are  classified  because 
they  guarantee  a  very  even  distribution  of  the  samples 
in  this  new  space. 


3.  TRAINING  OF  HIGHER 
ORDER  MEMORIES 

Once  the  initial  polynomial  mapping  has  been  se¬ 
lected,  the  rest  of  the  system  in  a  higher  order  memory 
is  simply  a  linear  discriminant  function.  As  such  it  can 
be  trained  by  any  of  the  existing  methods  for  training 
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FKMME  3.  (a)  Tha  angla  bataraan  Hnaarty  and  quadradcadir  axpandad  vadora  aa  a  luncdon  of  tha  hamming  diatanoa  at  tha  Input; 
(b)  Tha  angla  batwaan  Hnaarty  and  cubieaHy  axpandad  vaciota  aa  a  function  of  iha  hamming  dictanca  at  tha  Input 


linear  discriminant  functions.  For  instance  the  pseu- 
doinvene  (Kohonen,  1984;  Venkatesh  &  Psaltis,  in 
press)  can  be  used  to  calculate  the  set  of  weights  that 
will  map  a  set  of  L-dimensional  expanded  vectors  z” 
to  the  associated  output  vectors  y".  Altemai :  ely,  error 


driven  algorithms  such  as  the  perception  or  adaline 
can  be  used  to  iteratively  train  the  memory  by  repeat¬ 
edly  presenting  the  input  vectors  to  the  system,  mon¬ 
itoring  the  output  to  obtain  an  error  signal,  and  mod¬ 
ifying  the  weights  so  as  to  gradually  decrease  the  error. 


High  Onkr  Associative  Mmoria 
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noURC  4.  TIm  angl*  bttwttn  axpandMl  wctow  tor  lolictod  ordoro. 


The  relative  ease  with  which  higher  order  memories 
can  be  trained  is  a  very  important  advantageous  feature 
of  this  ^roach.  A  higher  order  memory  is  basically 
a  multilayered  network  where  the  fust  layer  is  selected 
a  priori.  In  terms  of  capacity  alone,  there  is  no  advan* 
tage  whatsoever  in  having  multiple  layers  with  modi¬ 
fiable  weights.  From  Theorem  1  we  know  that  at  best 
the  capacity  is  determined  by  the  number  of  modifiable 
weights.  For  a  higher  order  memory  we  get  the  fuU  ad¬ 
vantage  of  the  available  degrees  of  freedom  whereas  if 
we  put  the  same  number  of  weights  in  multiple  layers 
the  resulting  degeneracies  will  decrease  the  capacity. 
The  relative  advantage  of  trainable  multiple  layers  b 
the  potential  for  generalization  that  emerges  tluough 
the  learning  process.  The  generalization  properties  of 
higher  order  memories  on  the  other  hand  ate  mostly 
determined  by  the  choice  of  the  terms  used  in  the  poly- 
nomial  expansion  in  the  fixed  first  layer.  Thus  the  gen¬ 
eralization  properties  of  these  memories  as  described 
in  thb  paper  are  imposed  a  priori  by  the  designer  of 
the  system. 

The  sum  of  outer  products  algorithm  that  has  been 
used  extensively  for  training  linear  associative  memories 
can  also  be  used  for  training  the  higher  order  memories 
and  thb  algorithm  generalizes  to  the  higher  order  case 
in  particularly  interesting  ways.  In  addition,  thb  par¬ 
ticular  learning  algorithm  b  predominantly  used  for 
the  holographic  optical  implementations  tlut  are  de¬ 
scribed  in  tte  following  section.  Therefore  we  will  dis¬ 
cuss  in  some  detail  the  properties  of  higher  order  mem¬ 
ories  that  are  trained  using  thb  rule. 


3.1  The  Outer  Product  Rule 

Let  us  consider  associative  memories  constructed  as 
an  expansion  of  the  r-order  only  with  input  samples  in 
an  dimensional  binary  space  and  r  1. 

>'/»sgn{  2  (17) 

where  1  1  ^  ^  No.  The  number 

of  independent  terms  L  in  the  rth  order  expai;aon  b 
)  which  for  r  ^  Af  can  be  approximated  by  N'lr\ 
The  expression  for  the  weights  of  the  rth  ordt  *  ex¬ 
pansion  using  the  sum  of  outer  products  algorithm  b 
(Chen  et  al.,  1986;  Psaltb  &  Park,  1986) 

^v^h■  •  “  2  yfxTxxZ'  •  ‘x^  (i8) 

where  M  b  the  number  of  vectors  stmed  in  the  memory, 
y”  b  an  output  vector  associated  with  an  input  vector 
s'"  as  before.  With  the  above  expression  the  weight 
tensor  (17)  can  be  rewritten  as  follows 

u  s 

y/  -  sgn{  2  yTi  2  xTXjY  +  w?}.  (19) 

m»l  y-i 

The  above  equation  suggests  an  alternate  implemen¬ 
tation  for  higher  order  memories  that  are  trained  using 
the  outer  product  rule.  Thb  b  shown  schematically  in 
Figure  S.  The  inner  products  between  the  input  vector 
and  all  the  stored  vectors  x"  are  formed  first,  then  raised 
to  the  rth  power,  and  the  signal  from  the  mth  urtit  b 
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c(»inected  to  the  output  through  interconnective 
weights  yT.  If  »  x”*  then  the  memory  is  autoasso- 
dative,  and  in  this  case  the  output  can  te  fed  back  to 
the  input  resulting  in  a  system  whose  stable  states  are 
programmed  to  be  the  vectors  x".  This  becmnes  a  direct 
extension  of  the  Hopfield  network  (Anderson,  1983; 
Hopfield,  1982;  Nakano,  1972)  to  the  higher  order  case. 
Assuming  that  x  =  x”  is  one  of  the  stored  vectors,  yi 
becomes 

y,  =  sgn{JV'y7  +  2 

»  sgn{Ar'y?  +  fl,(x^}  (20) 

where  the  first  term  is  the  desired  signal  term  and  m  is 
a  noise  term.  The  threshold  weight  is  set  to  zero. 

The  expectation  value  of  n/(x*)  is  zero  if  the  bits  that 
comprise  the  stored  binary  input  and  output  vectors 
are  drawn  randomly  and  independently  having  equal 
probability  of  being  +1  or  -1.  If  this  is  the  case  then 

f(2xrx?T)= 

a>  «' 

£(  2  xTxT')  =  2  ««„•  (21) 

fWfi'  mm* 


where  dj/  is  the  Kronecker  delta  function.  The  variance 
of  Hi  is  ^culated  as  follows: 


E(nh 


E('Z  1  yryr'  2  2  x^xS-- 

m'*n  i\h-  •  ’Jr  tif  • 

xT.xlXh’  •  ’XlxT.'x’Z'-  •  •xJ'jfJ.xi-  •  -xy 
Ei'Z  z  2  xT^xZ-^x-axTyX^--- 

hh-  •  'iT  n-  •  -if 


x?rXlx%  •  •  •  xlxlXn  '"XD.  (22) 

In  the  above  we  used  the  facts  that  different  stored  vec- 
tt»s  are  uncorrelated  (i.e.,  for  m  ^  m')  and  y/  «  I. 
Then,  the  variance  becomes  (M  ~  \)QHN,  r),  where 
QHN,  r)  is  the  number  of  possible  permutations  such 
that 

1  (23) 


where  the  set  of  variables  {t'l ,'...,  i„  ti,..  .,t,}  spans 
all  the  combinations  produced  by  the  set  of  variables 
Ut, . . .  Jr,Si,. . . ,  The  variance  can  be  calculated 
exactly  for  the  cases  r  «  1 , 2,  and  3  and  it  is  (Af  -  1  )A^, 
(M  -  1X3A(*  -  2A0  and  (A/  - 
-f  1 6N),  req)ectively.  For  the  general  case  we  will  derive 
lower  and  upper  bounds  which  for  large  N  provide  us 
with  a  good  estimate  of  the  variance  for  any  order  r. 
Proposition  4:  The  total  number  of  permutations,  Qiff, 
r),  for  which  (23)  holds,  satisfies  the  following  relation- 
ship: 


Wr)^  + 


(^;)p(«.r-l) 


(2r-4)! 
2"-^r  -  2)1 


aN,  r)  i  N' 


(2i1! 

27! 


(24) 


where  F{m,  n)  ■  ml/(m  -  n)l 
Proof:  The  number  of  ways  of  making  r  pairs  of  2r 
items  is  (2r  -  lX2r  -  3)«  •  -(3X1)  *  (2r)!/27!.  The 
items  that  we  are  concerned  with  are  the  variables  ij, 
tj  and  each  of  these  variables  can  take  one  of  N  values. 
We  can  only  select  the  values  of  half  these  variables  {N' 
possibilities)  and  for  each  of  these  choices  we  can  create 
r  pairs.  Hence  the  upperbound  is  A'^'(2r)!/27!.  This  is 
an  upper  bound  because  we  have  overcounted  for  dif¬ 
ferent  pairings  of  variables  that  have  the  same  value. 

The  initial  lower  bound  is  derived  if  each  pair  has  a 
different  value  from  all  others,  which  eliminates  the 
possibility  of  overcounting.  The  number  of  possible 
ways  to  satisfy  (23>  with  the  variables  in  any  two  pairs 
not  taking  the  same  values  is  /’(W,  rX2r)!/27!.  This  is 
an  underestimate  because  all  pairs  that  contain  vari¬ 
ables  taking  the  same  value  should  be  counted  once. 
We  can  thus  improve  the  lower  bound  by  counting  the 
number  of  ways  these  degenerate  pairings  occur  and 
adding  them  into  the  previous  bound.  For  example 
when  two  pairs  out  of  r  have  the  same  values  with  (^ 
choices,  there  are  (^)NP(N  -  1,  r  -  2X2r  -  4)!/2'“^r 

-  2)!  possible  permutations  where  (2r  -  4)!/2'“^r 

-  2)1  is  the  number  of  ways  of  making  r  -  2  pairs  of 
2r  -  4  items.  Therefore,  r)  is  lower  bounded  by 
f»(W,  r)i2r)\l2'r\  +  (^(W,  r  -  lX2r  -  4)!/2'-^r  -  2)1, 
since  NP{N  -  1,  r  -  2)  ■  P(N,  r  -  1). 
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We  can  get  a  very  good  approximation  to  the  SNR 
using  the  approximadons  oS  M  -  \  m  M  saA  Q(N,  r) 
N'llrfi/Trl  which  are  very  nearly  true  for  the  in¬ 
teresting  case  r  <  M 


fiy  the  rth  order  autoassociative  memory  with  feedback 
and  outer  products  as  follows: 

Er--X  <x".  '  (30) 


*  {Af2V'(2r)!/2>!}*^ 
fiV*'27lV^ 
"lA/(2r)!)  • 


where  ( * ,  ' )  doiotes  an  inner  product  of  two  vectors. 
The  change  in  the  energy  due  to  a  change  ix  in  the 
state  of  the  network  was  shown  by  Chen  et  al.  (1986) 
to  be  decreasing  for  odd  r. 


For  example,  the  linear  memory,  r  >  1,  has  a  SNR 
m  {N/My^,  the  quadratic  memory,  r  >  2,  a  SNR  of 
N/{3M)'^  and  the  cubic  mem^,  r  -  3,  a  3WJ?  of  (iVV 
lSA/)'^<  We  can  obtain  an  estimate  for  the  capacity  of 
an  rth  order  memory  by  equating  the  signal  to  noise 
ratios  of  the  linear  and  /ih  order  memmies  and  solving 
f<v  the  number  of  stored  vectors  that  will  yield  the 
equality.  For  r,  small  compared  to  AT,  we  obtain 


Ml 

A/, 


(26) 


Comparing  its  value  with  the  capacity  Mi  of  a  linear 
memory  we  can  obtain  the  relationship  between  the 
capacities,  that  is,  MJMi  Ar"'2''r!/(2r)i.  For  example 
A/i  of  a  quadratic  memory  is  My  N/3  and  A/3  of  a  cubic 
memory  is  MiN^/l5. 

The  diagonal  terms  in  a  hi^  order  memory 
.  .j,  can  be  defined  as  those  of  which  all  the  in¬ 
dexes  j  are  not  different  We  form  the  weight  tensor 
with  zero  diagonal  as  follows: 


^Uih-  •  -/f 

^  "e  different  ^^7) 

|o  otherwise. 

When  the  input  is  one  of  the  stored  vectors  x"  and  the 
weight  tensor  has  zero  diagonal,  the  output  yi  becomes 

y/  -  sgn{  2  ■  jrXf,x^ "'Xl+  w?} 


-  sgn{  W  r)y7 


SE,  ■  E,(x  +  ix)  -  £r(x) 

--(r+l)2&c/  2 

I  Jf  ‘Jr 

y.Xj,Xj,’--Xj,-Rr  (31) 

where 

-  2  2  ‘  \  <X".  '-><x",  ixy.  (32) 

m  J»2  \  J  / 

The  first  term  in  (31)  is  always  nonpositive  because  of 
the  specification  of  the  update  rule:  ^  0  if  Z>, . .  .^, 
^  •  -JrXjiXj^  •  *  •  i  0  and  vice  versa.  Chen  et  al. 

(1986)  shov^  that  the  second  term  is  also  nonpositive 
by  showing  that  J{r  is  an  increasing  function  of  r  for  r 
0^  and  J{|  >  0. 

For  r  even  it  is  possible  to  prove  the  autoassociadve 
memory  converges  only  for  asynchronous  updating 
even  though  in  simulations  even  order  autoassociative 
memories  consistently  converge  as'well.  The  feet  that 
the  energy  is  not  always  decreasing  when  r  is  even  may 
actually  te  helpful  for  getting  out  of  local  minima  and 
settling  in  the  programmed  stable  state  whidi  are  global 
minima  in  a  region  of  the  energy  surface.  A  descent 
procedure  that  is  always  decreasing  in  energy  cannot 
escape  local  minima  since  there  is  no  mechanism  for 
climbing  out  of  them.  As  an  example,  cmisider  a  qua¬ 
dratic  memory,  that  is,  r  »  2  (even),  whose  energy 
function  is  given  by 

£2  =  —  2  ^tikXtXjXk  (33) 

Ok 


+  2  y"  2  x7;xZ’  •  • 

m^H  ^SatoxJ 


A£2  -  -3  2  ^okXjXkix,  -  3  2  WijtXicSx,6Xj 

ifk  ijk 


x'hX'JXh" 'XI-¥w7}  (28) 

where  the  first  term  is  a  signal  term  and  the  second  a 
noise  term  as  before.  The  variance  of  the  noise  term  is 
easily  shown  to  be  (A/  -  I  )P(,N  r)r!  using  (2 1 ).  There- 
fine,  the  SNR  becomes 

which  can  be  approximated  as  (A(7A/r!)‘^  for  r<N. 

Chen  and  his  coworkers  (1986)  introduced  an  energy 
function  (Cohen  &  Grossberg,  1983;  Hopfield,  1982) 


-  2  W^ok^iSxjSXk.  (34) 

Ok 

The  first  term  is  nonincreasing  but  the  second  and  third 
terms  can  be  increasing.  If  the  vector  x  is  very  close  to 
one  of  the  stored  vectors  x”  then  the  first  term  becomes 
dominant  and  the  energy  will  be  very  likely  to  be  non¬ 
increasing  causing  the  system  to  settle  at  x  *  x”.  If  x 
is  not  close  to  any  of  the  stored  vectors,  then  all  three 
terms  in  the  above  equations  are  on  the  average  com¬ 
parable  to  each  other  and  since  two  of  them  are  not 
nondecreasing  the  energy  function  may  be  increasing 
and  it  is  possible  to  escape  from  local  minima. 
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noURC  6.  HolognipMe  wcofJhig  and  racominiellQn.  (a)  Ra- 
caiding,  (b)  raeonalnietioii. 


4.  OPTICAL  IMPLEMENTATIONS  OF 
QUADRATIC  ASSOCUTTVE  MEMORIES 

The  outer  product  quadratic  associative  memories 
described  in  the  previous  section  require  three  basic 
components  for  their  impiementation:  interconnective 
weights,  a  square-law  di^ice,  and  a  threshold  nonlin¬ 
earity.  In  this  section,  we  present  a  variety  of  optical 
implementations  using  eitto  planar  m’  volume  holo¬ 


grams  to  provide  the  interconnection  pathways  and  op¬ 
tical  or  electro-optical  devices  to  provide  the  required 
nonlinearities. 

Since  hdographk  techniques  are  used  to  implement 
the  required  interconnections,  we  will  first  briefly  dis¬ 
cuss  holography  (Collier,  Burkhardt,  &.  Lin,  1971)  and 
in  particular  the  distinction  between  the  use  of  planar 
versus  volume  holograms.  The  holographic  process  is 
shown  schematically  in  Hgure  6.  In  the  recording  step 
(Figure  6a)  the  interfinenoe  between  the  reference  plane 
wave  that  is  created  by  collimating  the  light  from  a 
point  source  using  a  lens  and  the  wave  originating  from 
the  object  “A”  is  recorded  on  a  planar  light  sensitive 
medium  such  as  a  photographic  plate.  When  the  de¬ 
veloped  plate  is  illuminated  with  the  same  reference 
wave,  the  field  that  is  difiracted  by  the  recorded  inter¬ 
ference  pattern  gives  a  virtual  image  of  the  original  ob¬ 
ject  which  can  be  converted  to  a  red  image  with  a  lens. 
The  reconstruction  of  the  hologram  is  thus  equivalent 
to  interconnecting  the  single  point  from  which  the  plane 
wave  reference  is  derived  to  ^  the  points  that  comprise 
the  reconstructed  image.  The  weight  of  each  intercon¬ 
nection  is  specified  by  the  interference  pattern  stored 
in  the  hologram. 

Volume  holograms  are  prepared  and  used  in  the 
same  manner  except  that  whereas  a  {flanar  hologram 
records  the  interference  pattern  as  a  two  dimensional 
pattern  on  a  plane,  a  volume  hologram  records  the  in¬ 
terference  pattern  throughout  the  volume  of  a  three 
dimension^  medium.  The  disparity  in  the  dimen¬ 
sionalities  of  the  two  storage  formats  results  in  marked 
differences  in  the  capabilities  of  the  two  processes.  This 
diflerence  is  explained  with  the  aid  of  Figures  7a  and 


noURE  7.  Hotographle  Inumonnactloin  using  (a)  planar  vaiaus  (b)  vokma  hotegrania. 
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7b  where  the  reoonstructioa  of  both  a  planar  and  a 
volunM  hoiograni  are  shown.  Each  hologram  is  {»«> 
pared  to  store  the  two  images  “A”  and  **0”  by  double 
exposure  with  each  image  being  associated  with  a  ref¬ 
erence  plane  wave  that  is  incident  on  the  hologram  at 
a  difierent  angle.  Each  reference  plane  wave  is  generated 
by  a  separate  point  source  and  thus  the  reconstruction 
of  a  hologram  with  the  two  reference  waves  is  equivalent 
to  interconnecting  multiple  input  points  to  all  tte  pt^ts 
on  the  plane  of  the  reconstructed  image.  In  the  case  of 
the  planar  hdogram,  however,  when  either  one  of  the 
reference  waves  is  incident  both  images  are  recon¬ 
structed.  This  implies  that  we  cannot  in  this  case  in¬ 
dependently  specify  how  each  of  the  input  points  is 
connected  to  the  output.  In  contrast,  because  of  the 
interaction  of  the  &el^  in  the  third  dimension  (Kr^l- 
nik,  1969)  the  volume  hologram  is  able  to  resolve  the 
diffiaences  in  the  angle  of  incidence  of  the  reference 
beam  and  upon  reconstruction  when  the  reference  for 
“A”  illuminates  the  medium,  only  “A”  is  reconstructed 
and  similarly  for  the  second  pattern.  When  both  input 
points  are  on  simultaneously  then  each  is  intercon¬ 
nected  to  the  output  independently  according  to  the 
way  it  was  q)ecifi^  by  the  recording  of  the  two  holo¬ 
grams.  Thus  volume  holograms  provide  more  flexibility 
for  implementing  arbitrary  interconnections  which 
translates  to  efficient  three  dimensional  storage  of  tte 
interconnecttvv  weights  needed  to  specify  the  quadratic 
memory. 

Another  way  in  which  we  can  draw  the  distinction 
between  planar  and  volume  holograms  is  in  terms  of 
the  degrees  of  freedom.  The  implementation  of  a  qua¬ 
dratic  memory  whose  input  word  size  is  Wbits  requires 
apiMTOximately  interconnections  for  the  three  di¬ 
mensional  interconnection  tensor.  The  number  of  de¬ 
grees  of  freedom  of  the  planar  hologram  of  area  A  is 
upper  bounded  by  while  that  of  a  volume  holo¬ 
gram  is  limited  to  V/i^,  where  K  is  the  volume  the 
crystal  and  5  is  the  minimum  detail  that  can  be  recorded 
in  any  one  dimension  (Psaltis,  Yu,  Gu,  &  Lee,  1987; 
Van  Heerden,  1963).  Equating  the  degrees  of  freedom 
that  are  required  to  do  the  job  to  those  that  are  available, 
the  crystal  volume  is  determined  to  be  at  least  V 
=  whereas  a  planar  hologram  to  do  the  same  job 
would  require  a  hologram  of  area  A  =>  For  com¬ 
parison,  a  network  with  AT  »  10^  can  in  principle  be 
implemented  using  a  cubic  crystal  with  the  length  of 
each  side  being  /,  «  A/9  -  1  cm,  but  a  square  planar 
hologram  is  required  to  have  the  length  of  each  side  be 
at  least  /,  *  A/*^9  ■  0.33  m  at  9  *  10  Min.  Thus,  the 
volume  hemogram  offers  a  more  compact  means  of  im- 
plementiog  large  memory  systems. 

4.1  Volume  Hologram  Systems 

There  are  several  schemes  for  fully  utilizing  the  tn- 
terconneedve  capability  of  volume  holograms  (Psaltis 


N^INSUTS 


(a) 

VOUJME  HOUMMAM 


>■3  n2  OUTPUTS 


(a)  Recording  apparatus;  (b)  M  •->  M*  mapping;  (c)  M*  M  map¬ 
ping- 

et  al.,  1987;  Psaltis,  Brady,  &  Wagner,  in  press).  For 
the  implementation  of  quadratic  memories  we  use  vol¬ 
ume  holograms  to  fully  interconnect  a  2-D  pattern  to 
a  1-D  pattern  A/ m^spings)  and  also  the  reverse 
{N  *-*  N^).  The  geometry  for  recording  the  weights  for 
both  cases  is  shown  in  Figure  8a  and  the  reconstruction 
geometries  are  illustrated  in  Figures  8b  and  8c.  The 
circles  represent  the  resolvable  spots  at  the  various 
planes  in  the  system.  The  waves  emanating  from  each 
point  at  the  input  planes  are  transformed  into  plane 
waves  by  the  Fourier  transform  lenses  L\  and  Lj  and 
interfere  within  the  crystal,  creating  volume  gratings. 

The  weights  are  loaded  into  the  volume  hologram 
with  multiple  holographic  exposures  in  the  system  of 
Figure  8a.  In  the  following  subsections  we  will  describe 
several  specific  procedures  for  doing  so.  For  the 
mapping  (Figure  8b)  in  reading  out  the  stored  in¬ 
formation,  a  single  source  in  the  input  arr^  recon¬ 
structs  one  of  the  N  2-D  images  consisting  of  pixels 

that  it  is  associated  with.  The  rest  of  the  images,  which 
belong  to  the  other  input  points,  are  not  read  out  be¬ 
cause  of  the  angular  discrimination  of  volume  holo¬ 
grams.  The  counterpart  to  this  scheme,  shown  in  Figure 
8c,  implements  an  arbitrary  *-*  N  mapping.  This 
setup  is  basically  the  same  as  that  of  Figure  8b  except 
that  the  roles  of  the  input  planes  have  been  interchanged 
or  equivalently  the  direction  in  which  light  propagates 
has  been  reversed. 
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4.1.1  •-»  N  Schemes.  Bnt,  we  consider  a  method 

by  wbicfa  the  lull  three  dimensioDal  interconnectioa 
tensor  is  implemented  directly  with  a  volume  hologram. 
Recall  that  if  the  weight  tensor  is  trained  using  the  sum 
of  outer  products  then  it  is  given  by 

M 

n't*  -  Z  yTxTxP,  (35) 

where  xT  represents  the  mth  input  memory  vector  and 
yf  represents  the  associated  output  vector.  Such  a 
memory  is  accessed  by  first  creating  an  outer  inoduct 
of  the  input  vector  ai^  multiplying  it  with  as  fol¬ 
lows: 

N  N 

yi  “  sgn{  2  Z  '*>tiicXjXk}.  (36) 

j^\ 

The  volume  hologram  is  prepared  using  the  setup 
in  Hgure  8a.  Hrst,  the  outer  product  matrix  of  the  mth 
memory  input  vector,  xfxk,  is  formed  on  an  elec¬ 
tronically  addressed  spatial  light  modulator  (SLM) 
(Warde  &  Fisher,  1987),  Another  one-dimensional  SLM 
whose  transmittance  represents  the  mth  output  vector 
yf  is  placed  in  the  other  input  plane,  and  the  two  SLMs 
are  illuminated  by  coherent  light  The  transmitted 
waves  are  then  Fourier  transformed  by  lenses  Li  and 
Li  to  interfere  within  the  crystal  volume  to  create  index 
gratings.  This  procedure  is  repeated  for  all  Af  associated 
input-output  pairs  so  that  a  sum  of  M  holograms  is 
created  in  the  crystal.  For  the  quadratic  outer  product 
memory  whose  capacity  is  fully  expended,  this  involves 
on  the  order  of  N^I\og  N  exposures. 

We  will  now  describe  another  method  for  recording 
the  weight  vector  in  the  volume  hologram  that  involves 
fewer  exposures  and  can  also  be  used  not  only  for  the 
outer  product  scheme  but  for  recording  any  given 
weight  tensor  as  weU.  The  same  basic  recording  archi¬ 
tecture  of  Figure  8a  is  used  in  this  case  also.  In  the  first 
exposure,  the  top  light  source  in  the  linear  array  is 
turned  on  while  the  SLM  is  programmed  with  the  ma¬ 
trix  tvi  jk,  where  is  the  interoonnectimi  tensor.  When 
the  SLM  is  illuminated  with  light  coherent  with  that 
of  the  point  source,  the  crystal  records  the  mutual  in¬ 
terference  pattern  as  a  hologram  of  the  image  w\jk  with 
a  reference  beam  that  is  the  plane  wave  generated  from 
the  top  light  source.  In  the  next  step,  the  second  source 
is  turned  on  while  the  SLM  is  programmed  with  the 
matrix  m^.  In  this  manner  the  connectivity  for  all  the 
points  in  the  linear  array  at  the  input  are  sequentially 
q)ecified  and  the  memory  training  is  comple^  when 
ail  N  exposures  have  been  made.  The  disadvantage  of 
this  method  relative  to  the  outer  product  recording  is 
the  need  to  precalculate  electronically  the  weight  tensor 
but  it  has  the  advantage  of  fewer  exposures  (N  versus 
A(’/logA0  and  greater  flexibility  in  choosing  the  training 
method. 


The  architecture  in  Figure  8c  is  used  to  access  the 
data  stored  in  the  hologram  by  either  one  of  the  re¬ 
cording  methods  described  above.  The  electronically 
addressed  2-D  SLM  is  placed  at  the  input  plane  and  it 
is  programmed  with  the  outer  product  matrix  XkXj  of 
the  input  vector.  The  light  from  the  input  points  is 
interconnected  with  the  N  output  points  via  the  re¬ 
corded  wyk  interconnea  kernel  A  linear  array  of  N 
photodetectors  is  positioned  to  sample  the  ouq>ut 
points. 

It  is  important  to  restate  at  this  juncture  that  this 
particular  implementation  achieves  the  quadratic  in¬ 
terconnections  by  first  transforming  the  N  input  fea¬ 
tures  (i.e.,  the  N  elements  of  the  input  vector  xy)  into 
a  set  of  features  via  the  outer  produa  operation. 
The  result  is  that  although  the  interconnections  are 
quadratic  with  respect  to  the  iV  original  feature  points, 
they  are  linear  with  respect  to  the  transformed  fea¬ 
tures.  This  allows  the  application  of  error  driven  learn¬ 
ing  algorithms  for  linear  networks  such  as  the  Adaline 
(Widrow  &  Hoff,  1960)  where  the  interconnections  are 
developed  by  an  iterative  training  process.  The  opera¬ 
tion  of  such  a  learning  scheme  is  illustrated  in  Figure 
9  which  is  the  same  basic  architecture  as  Figure  8c  with 
feedback  from  the  output  back  into  one  of  the  input 
pmts.  Each  iteration  consists  of  a  reading  and  a  writing 
phase.  During  the  reading  phase,  the  interconnections 
present  in  the  crystal  are  intem^ted  with  a  particular 
item  to  be  memorized  by  illuminating  the  2-D  SLM 
which  contains  the  outer  product  matrix  x'V*  and  the 
output  is  formed  on  the  detector  array.  In  the  subse¬ 
quent  writing  phase,  the  error  pattern  generated  by 
subtracting  the  actual  output  from  the  desired  output 
pattern  is  loaded  into  the  1-D  SLM  and  both  SLMs 
(the  2-D  SLM  still  contains  x'V*)  are  illuminated  with 
coherent  light,  forming  a  set  of  gratings  in  addition  to 
the  previously  recorded  gratings.  The  procedure  is  it¬ 
eratively  repeated  for  each  item  to  be  memorized  until 
the  output  error  is  sufficiently  small.  This  algorithm  is 
a  descent  procedure  designed  to  minimize  the  mean 

squared  cost  *  -  Im-i  iSjli  SJLt  wykXTxk 
M 

—  y”]^  by  iteratively  updating  the  interconnection 
values. 

4.1.2  N  *-»  Schemes.  The  mapping  capa¬ 

bility  of  the  volume  hologram  which  is  the  inverse  of 
that  required  for  the  architectures  just  described  can 
be  used  also  to  implement  quadratic  memories  and 
can  be  generalized  fiw  higher  order  memories.  The  basic 
idea  behind  this  scheme  is  illustrated  in  Figure  10  which 
shows  the  interconnection  between  the  ith  and  jth  neu¬ 
rons  whose  weight  is  a  linear  combination  of  all  of 
the  inputs  and  is  des^bed  by 
s 

-  Z  WykXk.  (37) 
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The  overall  result  is,  of  course,  recognized  to  be  titt 
eqtiation  describing  the  quadratic  memory,  but  the  no¬ 
tion  of  an  input  dependrat  weight  suggests  the  imple¬ 
mentation  shown  in  Figure  1 1 .  The  system  is  basic^y 
an  optical  vector  matrix  multiplier  (Goodman,  Dias, 
&  Woody,  1978)  in  which  the  matrix  is  created  on  an 
optically  addressed  SLM  by  multiplying  the  input  vec¬ 
tor  with  the  three-dimensional  tensor  stored  in  a  vol¬ 
ume  hologram.  The  input  vector  is  represented  by  a 
one  ^menaonal  array  of  light  sources.  The  portion  of 
the  system  on  the  left  side  of  the  SLM  is  the  vector 
matrix  multiplier  and  it  works  as  follows.  Light  from 
each  input  point  is  imaged  horizontally  but  spread  out 
vertically  so  that  each  source  illuminates  a  narrow,  ver¬ 
tical  area  on  the  2-D  SLM.  The  reflectance  of  the  SLM 
corresponds  to  the  matrix  of  weights  in  (37).  The 
reflect^  light  from  the  SLM  travels  back  towanls  the 
input  and  a  portion  of  it  is  reflected  by  a  beam  splitter 
and  then  imaged  horizontally  but  focused  vertically 
onto  a  1-D  output  detector  array.  The  output  from  the 
detector  array  represents  the  matrix  vector  product  be- 
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tween  the  input  vector  and  the  matrix  represented  by 
the  2-D  reflectance  of  the  SLM.  The  matrix  of  weights, 
in  this  case,  is  not  fixed  but  rather  computed  fi-om  the 
input  via  a  volume  hologram  by  exposing  the  righthand 
side  of  the  SLM  as  shown  in  the  figure.  The  optical 
system  to  the  right  of  the  2-D  SLM  in  Figure  1 1  is  the 
same  a»ibcN>~*N^  system  of  Figure  8b.  The  volume 
hologram  which  has  been  prepared  to  perform  the  ap¬ 
propriate  dimension  increasing  operation  (N  *-» 
transforms  the  light  distribution  given  by  its  one  di¬ 
mensional  array  of  sources  into  the  input  dependent 
matrix  of  weights  given  by  (37).  This  system  is  func¬ 
tionally  equivalent  to  the  previous  system  except  it  does 
not  require  the  use  of  a  2-D  electronically  addressed 
input  SLM.  The  1-D  devices  utilized  in  this  architecture 
are  easier  and  faster  to  use  in  practice.  Instead  a  2-D 
optically  addressed  SLM  is  needed  which  in  practice 
is  simpler  to  use  compared  to  electronicaUy  addressed 
devices  (requires  less  electronics),  typically  has  more 
pixels,  and  is  potentially  much  higher  speed.  A  disad¬ 
vantage  of  this  method,  however,  is  that  it  does  not  lend 
itself  for  the  direct  implementation  of  the  simple  outer 
product  training  method  without  the  use  of  an  elec¬ 
tronically  addressed  2-D  SLM. 

The  N  mapping  technique  can  be  used  in 
conjunction  with  its  inverse,  the  JV  mapping,  to 

implement  the  quadratic  outer  product  memory  using 
two  volume  holograms,  a  1-D  electronically  addressed 
SLM,  and  an  optically  addressed  2-D  SLM.  Shown  in 
Figure  12  is  a  schematic  diagram  of  such  a  system.  The 
first  hologram  is  prepared  with  the  multiple  exposure 
scheme  discussed  earlier  (Fgure  8a)  where  for  each  ex¬ 
posure,  a  memory  vector  in  the  one-dimensional  input 
array  and  one  point  in  the  two-dimensional 
X  W)  input  training  array  are  turned  on  simulta¬ 
neously.  The  second  hologram  is  prepared  by  a  similar 
procedure  except  th<>i  the  associated  output  vectors  are 
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reoMded  in  correspondence  to  each  point  in  the  two 
dimensional  training  plane.  After  the  holograms  are 
thus  prepared,  an  input  vector  is  loaded  into  the  one- 
dimensional  input  array  and  the  correlations  between 
it  and  the  A/ memory  vectors  are  displayed  in  the  output 
plane  (Athale,  Szu,  Sl  Friedlander,  1986;  Owechko, 
Dunning,  Marom,  St  Soffer,  1987;  Paek  &  Psaltis, 
1987).  An  optically  addressed  SLM  can  be  used  to  pro¬ 
duce  an  amplitude  distribution  which  is  the  square  of 
the  incident  corrdation  amplitudes.  The  processed  light 
then  illuminates  the  second  hologram  which  serves  as 
an  M*-*N  interconnection,  each  correlation  peak  in 
the  SLM  idane  reading  out  its  corresponding  memory 
vector  and  forming  a  weighted  sum  of  the  stored  mem¬ 
ories  on  the  one  dimensional  output  detector  array.  This 
is  a  direct  optical  implementation  of  the  system  shown 
in  block  diagram  form  in  Figure  S  with  the  2-D  SLM 
performing  the  square  law  nonlinearity  at  the  middle 
plane  and  the  pwo-volume  holograms  providing  the  in¬ 
terconnections  to  the  input  and  output 

4.2  Planar  Holognun  Systems 

While  not  having  the  extra  dimension  to  directly 
implement  the  three  dimensional  interconnection  ten¬ 
sor  for  general  quadratic  memories,  planar  holograms 
can  nevertheless  implement  the  outer  product  quadratic 
memory  in  a  way  similar  to  the  one  used  in  the  system 


just  described.  The  planar  holographic  system  is  shown 
in  Figure  1 3.  Here,  the  information  is  stored  in  the  two 
multichannel  1-D  Fourier  transform  (FT)  holograms, 
the  first  of  which  contains  the  1-D  FTs  of  the  M  mem¬ 
ory  input  vectors  and  the  other,  the  FTs  of  the  associated 
output  vectors  (Psaltis  &  Hong,  1987).  The  first  part 
of  the  system  is  a  multichannel  correlator  which  cor¬ 
relates  the  input  against  each  of  the  A/  memory  vectors. 
At  the  correlation  plane,  the  M  correlation  functions 
stacked  up  verticaUy  are  sampled  at  x  »  0  with  a  slit 
to  obtain  the  required  inner  products  which  are  then 
squared  by  the  SLM.  Each  resulting  point  source  of 
li^t  is  then  collimated  horizontally  and  imaged  ver¬ 
tically  onto  the  second  hologram  to  illuminate  that 
portion  which  contains  the  corresponding  output  vec¬ 
tor.  The  final  stage  computes  the  FT  of  the  l^t  dis¬ 
tribution  just  following  the  second  hologram  to  produce 
the  weighted  sum  of  the  vectors  at  the  output  detector 
array.  It  is  interesting  to  note  that  if  the  SLM  is  removed 
from  the  correlation  plane,  this  system  reduces  to  the 
linear  outer  product  memory. 

Notice  that  in  this  system  if  the  input  pattern  shifts 
horizontally  then  the  correlation  peak  also  shifts  in  the 
correlation  plane  and  it  is  blocked  by  the  slit  that  is 
placed  there.  Therefore  shifted  versions  of  the  input 
vector  are  not  recognized,  as  expected.  Shift  invariance 
where  the  shifted  versions  of  the  memory  vectors  are 
recognized  and  their  associated  outputs,  shifted  by  the 
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same  amount  as  the  input,  are  retrieved  can  be  built 
into  this  system  by  sim^y  lengthening  the  input  SLM 
and  the  output  detector  array  to  accommodate  the  shifts 
and  removing  the  slit  in  the  correlation  plane.  The  re¬ 
sulting  system  treats  each  of  the  2A^  -  1  sUfted  versions 
of  the  memory  vectors  as  a  new  memory  and  as  a  result, 
the  increased  capacity  of  the  quadratic  memory  over 
the  linear  one  (by  a  ftictor  of  N)  is  expended  to  provide 
invariant  operation. 
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